Community
cancel
Showing results for 
Search instead for 
Did you mean: 
msr999
Beginner
31 Views

parallel_sort

Hi,

I am trying to do some testing with parallel_for. I tried a simple test program and parallel_sort actually 50% slower than regular sort. I tried this on Linux 2.6.28 with gcc 4.3.3 compiler. I ran the test on two different machines (one with 2 cores and one with 4 cores). If I look at cpu consumption, parallel_sort actually engages all cpus 100% while the regular std::sort only uses one cpu. So I am not sure whats happening. Is there something wrong with my program.

I am pasting the code here. Let me know thanks?

#include "tbb/parallel_sort.h"
#include "tbb/task_scheduler_init.h"
#include "tbb/tick_count.h"

#include
#include

using namespace tbb;
using namespace std;


int main(int argc, char * argv[])
{
task_scheduler_init init;

srand(time(NULL));

ulong num_ele = 10000000;
if (argc > 2)
num_ele = boost::lexical_cast(argv[2]);

cout << " Sorting " << num_ele << " longs " << endl;

long *elements = new long[num_ele];
cout <<"Initializing random data. " << endl;

for (ulong i=0; i < num_ele; ++i)
{
elements = ((ulong)rand() << 32) + rand();
}

tick_count t0 = tick_count::now();

if (argc > 1 && *(argv[1]) == 'r' )
{
cout << "Doing single threaded std::sort" << endl;
std::sort(elements, elements + num_ele);
}
else
{
cout << "Doing Parallel Sort" << endl;
parallel_sort(elements, elements + num_ele);
}

tick_count t1 = tick_count::now();

cout << "Time took to sort " << (t1 - t0).seconds() << " secs " << endl;
delete [] elements;
};
0 Kudos
5 Replies
Alexey_K_Intel3
Employee
31 Views

You might use an older TBB version. Update to a newer version, e.g. the last commercial-aligned release corresponding to TBB 2.1 Update 4.
Bartlomiej
New Contributor I
31 Views

Here are the results on my Intel Core 2 Quad Q6600 and the stable release (I added the number of threads when initializing the task scheduler):

serial: 2.545 secs
parallel, 1 thread: 3.68352 secs
parallel, 2 threads: 1.93752 secs
parallel, 4 threads: 1.07267 secs

So, it's clearly faster, but the speedup is not stunning.
There seem to be a large overhead for starting the threads. For a larger problem (like lasting a few minutes) the difference should be more encouraging.

Best regards

msr999
Beginner
31 Views

You might use an older TBB version. Update to a newer version, e.g. the last commercial-aligned release corresponding to TBB 2.1 Update 4.

I was using the latest release. Anyway its working great now. I get ~50% better performance. I am not sure what was wrong before. Completely puzzled. Quite sure I did somethign something stupid in my old runs (like linking to debug lib). Anyway thanks for prompt response.

While searching the forums, I found a similar issue reported by another user. It seems like you have identified it as Windows Specific. Do you know if there are any fixes to that issue?


THanks
mSR
msr999
Beginner
31 Views

Quoting - bartlomiej
Here are the results on my Intel Core 2 Quad Q6600 and the stable release (I added the number of threads when initializing the task scheduler):

serial: 2.545 secs
parallel, 1 thread: 3.68352 secs
parallel, 2 threads: 1.93752 secs
parallel, 4 threads: 1.07267 secs

So, it's clearly faster, but the speedup is not stunning.
There seem to be a large overhead for starting the threads. For a larger problem (like lasting a few minutes) the difference should be more encouraging.

Best regards



I tested with array of 500M longs on Quadcore and there is substantial improvement (66 sec Vs 26 sec).
On DualCore with 40M elements I can get around 50% improvement (45 sec Vs 30 Sec).
Alexey_K_Intel3
Employee
31 Views

Quoting - msr999
While searching the forums, I found a similar issue reported by another user. It seems like you have identified it as Windows Specific. Do you know if there are any fixes to that issue?

Yes, there was an earlier problem and we fixed it in an update to TBB 2.1. That's why I supposed you might have had an older version, and recommended you to update. If you use the last 2.1 update, you have the fix.