- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello All,
I am facing an issue with parallel for. I am using parallel_for for some particular operation and Iam not able to get any speedup from the TBB code.
I am facing an issue with parallel for. I am using parallel_for for some particular operation and Iam not able to get any speedup from the TBB code.
[bash]class Distance_Global { public: float* diff_array; float* temp1; float* query1; void operator()(const blocked_range& r) const { float *temp_1 = temp1; float *query_1 = query1; int end=r.end(); for (int i=r.begin();i!=end;++i){ diff_array=(temp_1 - query_1)*(temp_1 - query_1); } } }; int main (int argc, char ** argv) { int numElements = 19800; int GRAIN = 1000; if (argc == 3) { numElements = atoi(argv[1]); GRAIN = atoi(argv[2]); } cout << "Running with #Elements : " << numElements << " And GRAIN : " << GRAIN << endl; float out1[numElements], out2[numElements],diff_array[numElements]; for (int i=0; i < numElements; ++i) { out1 = 1.5345; out2 = 0.8976; } tick_count t0 = tick_count::now( ); Distance_Global dg; dg.diff_array=diff_array; dg.temp1=out1; dg.query1=out2; parallel_for(blocked_range (0,numElements,GRAIN),dg); tick_count t1 = tick_count::now( ); cout<<"Parall: "<<(t1-t0).seconds()< =(out1 - out2)*(out1 - out2); } tick_count t3 = tick_count::now( ); cout<<"Serial: "<<(t3-t2).seconds()< My number of elements numElements are fixed and == 19800.
I played with GRAIN size but it does not help me,
My TBBversion code takes almost 10 times the time taken by serial code. This is quite shocking.
Am I doing some mistake in the coding or what should I do to speeed up the things.
Thanks
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The amount of work is arguably not enough to justify parallelization, *especially* taking into account that the time to start worker threads is included into measurement.
I ran your test on my Intel Core i5 laptop (after some adjustments to compile it). Serial execution time was about 20 microseconds, parallel time was indeed 10 times more. However when I repeat the same computation thousands times in a loop, average serial time (i.e. elapsed time by the loop divided by number of iterations) remained 20 usec, while average parallel time was about 15 usec. I.e. there can be some benefit from parallelism if thread start time is amortized over the work done.
I ran your test on my Intel Core i5 laptop (after some adjustments to compile it). Serial execution time was about 20 microseconds, parallel time was indeed 10 times more. However when I repeat the same computation thousands times in a loop, average serial time (i.e. elapsed time by the loop divided by number of iterations) remained 20 usec, while average parallel time was about 15 usec. I.e. there can be some benefit from parallelism if thread start time is amortized over the work done.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page