Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

parallel_sort or jus algorithm::sort??

ktrfrk
Beginner
288 Views
i hav used parallel_sort in parallelizing some code. but the serial version using sort is faster. what is the reason??
is it becoz the parallel_sort uses lock and unlock thereby increasing time??what is the size of the array that parallel_sort gives considerable speedup??
thankyou
0 Kudos
1 Solution
Alexey-Kukanov
Employee
288 Views
Quoting - ktrfrk
i hav used parallel_sort in parallelizing some code. but the serial version using sort is faster. what is the reason??
is it becoz the parallel_sort uses lock and unlock thereby increasing time??what is the size of the array that parallel_sort gives considerable speedup??
thankyou

I'd guess that either your array is too small, or you use an old version of TBB, assome time ago we improved performance of parallel_sort.
Parallel_sort does not explicitly use locks, but still there is some overhead on enabling parallelism. The grain size for parallel execution is 500, which means that smaller chunks are processed serially (by calling std::sort). Thus to give you any benefit, parallel_sort should operate with an array of at least 500 elements, and to load all P available cores an array bigger than P*500/2 is required (i.e. >1000 elements for 4 cores, etc.).

View solution in original post

0 Kudos
1 Reply
Alexey-Kukanov
Employee
289 Views
Quoting - ktrfrk
i hav used parallel_sort in parallelizing some code. but the serial version using sort is faster. what is the reason??
is it becoz the parallel_sort uses lock and unlock thereby increasing time??what is the size of the array that parallel_sort gives considerable speedup??
thankyou

I'd guess that either your array is too small, or you use an old version of TBB, assome time ago we improved performance of parallel_sort.
Parallel_sort does not explicitly use locks, but still there is some overhead on enabling parallelism. The grain size for parallel execution is 500, which means that smaller chunks are processed serially (by calling std::sort). Thus to give you any benefit, parallel_sort should operate with an array of at least 500 elements, and to load all P available cores an array bigger than P*500/2 is required (i.e. >1000 elements for 4 cores, etc.).
0 Kudos
Reply