Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

My Parallel Sort Library and benchmarks ...

aminer10
Novice
1,086 Views


Hello all,

I have used my Parallel Sort Library and benchmarked an Object Pascal
program that sorts a dynamic array of 10 millions of strings.

Please look at the benchmarks in following page:

http://pages.videotron.com/aminer/parallelsort/parallelsort.htm


On four cores four threads, here is the results:

Parallel Quicksort gives 5.31x speed and 2.877 seconds
Parallel Heapsort gave me 4.72x and 7.452 seconds

Note: Parallel Quicksort is much faster in pratice than parallel heapsort
or parallel mergesort.

Sincerely
Amine Moulay Ramdane.



0 Kudos
5 Replies
Tudor
New Contributor I
1,086 Views
Are you sure the algorithms perform normally when there is only 1 core? It seems you are getting superlinear speedup for every configuration compared to 1 core. That seems very strange.
For example, I extracted this from your tests:
Heapsort:
2 cores - 13.382 sec
4 cores - 7.453 sec
this is normal, a bit below 2x speedup. But:
1 core - 35.188 sec
2 cores - 13.382 sec
that's almost 3x speedup for double the number of cores.
0 Kudos
aminer10
Novice
1,086 Views


Tudor wrote:
>Are you sure the algorithms perform normally when there is only 1 core? It seems you are getting >superlinear speedup for every configuration compared to 1 core. That seems very strange.


I thinkit's the 'divide and conquer' that givesthis speed up,due
to the fact that i am sorting 'parts' of the array in parallel and in
every parts of the array i am applying the quicksort or heapsort...


So, if we have 80 elements , in single threaded the average complexity
will be 80 Ln(80)this will give350.56 , but in mutlthreaded with four threads
on four coresthis will give 4(20 ln(20)) = 239 , and we are also runnning
the quicksort or heapsort in parallel on every part of the array , it's why
it gives better than 4x.

:)


Sincerely,
Amine.


0 Kudos
aminer10
Novice
1,086 Views


Hello,


Of course i am using NO synchronization between the threads...


Just a smallinterlockincrement() to know when the sort have finnished...




Amine.





0 Kudos
aminer10
Novice
1,086 Views

Tudor wrote:
>1 core - 35.188 sec
>2 cores - 13.382 sec
>that's almost 3x speedup for double the number of cores.


Don't forget that i am using an Intel Core 2 Quad , so,
the CPU-Z gave me: *2* x 4096 KBytes of Level2 cache.

So, on four cores i amsorting four chunks of the big array
with just *2* Level 2 caches :)


Amine.






0 Kudos
aminer10
Novice
1,086 Views

Tudor wrote:
>1 core - 35.188 sec
>2 cores - 13.382 sec
>that's almost 3x speedup for double the number of cores.


Forget about he complexity , cause with a big chunks there
is almost no difference in the complexity from one to two cores...

So , i think the single threaded program is less cache friendly
than the two threads... and since we have 2 xLevel 2 caches , this
explains the 3x from 1 core to two cores...

I don't have any other explaination...


Amine.
0 Kudos
Reply