Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
25 Views

parallel_sort or jus algorithm::sort??

Jump to solution
i hav used parallel_sort in parallelizing some code. but the serial version using sort is faster. what is the reason??
is it becoz the parallel_sort uses lock and unlock thereby increasing time??what is the size of the array that parallel_sort gives considerable speedup??
thankyou
0 Kudos

Accepted Solutions
Highlighted
25 Views
Quoting - ktrfrk
i hav used parallel_sort in parallelizing some code. but the serial version using sort is faster. what is the reason??
is it becoz the parallel_sort uses lock and unlock thereby increasing time??what is the size of the array that parallel_sort gives considerable speedup??
thankyou

I'd guess that either your array is too small, or you use an old version of TBB, assome time ago we improved performance of parallel_sort.
Parallel_sort does not explicitly use locks, but still there is some overhead on enabling parallelism. The grain size for parallel execution is 500, which means that smaller chunks are processed serially (by calling std::sort). Thus to give you any benefit, parallel_sort should operate with an array of at least 500 elements, and to load all P available cores an array bigger than P*500/2 is required (i.e. >1000 elements for 4 cores, etc.).

View solution in original post

0 Kudos
1 Reply
Highlighted
26 Views
Quoting - ktrfrk
i hav used parallel_sort in parallelizing some code. but the serial version using sort is faster. what is the reason??
is it becoz the parallel_sort uses lock and unlock thereby increasing time??what is the size of the array that parallel_sort gives considerable speedup??
thankyou

I'd guess that either your array is too small, or you use an old version of TBB, assome time ago we improved performance of parallel_sort.
Parallel_sort does not explicitly use locks, but still there is some overhead on enabling parallelism. The grain size for parallel execution is 500, which means that smaller chunks are processed serially (by calling std::sort). Thus to give you any benefit, parallel_sort should operate with an array of at least 500 elements, and to load all P available cores an array bigger than P*500/2 is required (i.e. >1000 elements for 4 cores, etc.).

View solution in original post

0 Kudos