Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!

parallel_sort or jus algorithm::sort??

ktrfrk
Beginner
104 Views
i hav used parallel_sort in parallelizing some code. but the serial version using sort is faster. what is the reason??
is it becoz the parallel_sort uses lock and unlock thereby increasing time??what is the size of the array that parallel_sort gives considerable speedup??
thankyou
0 Kudos
1 Solution
Alexey_K_Intel3
Employee
104 Views
Quoting - ktrfrk
i hav used parallel_sort in parallelizing some code. but the serial version using sort is faster. what is the reason??
is it becoz the parallel_sort uses lock and unlock thereby increasing time??what is the size of the array that parallel_sort gives considerable speedup??
thankyou

I'd guess that either your array is too small, or you use an old version of TBB, assome time ago we improved performance of parallel_sort.
Parallel_sort does not explicitly use locks, but still there is some overhead on enabling parallelism. The grain size for parallel execution is 500, which means that smaller chunks are processed serially (by calling std::sort). Thus to give you any benefit, parallel_sort should operate with an array of at least 500 elements, and to load all P available cores an array bigger than P*500/2 is required (i.e. >1000 elements for 4 cores, etc.).

View solution in original post

1 Reply
Alexey_K_Intel3
Employee
105 Views
Quoting - ktrfrk
i hav used parallel_sort in parallelizing some code. but the serial version using sort is faster. what is the reason??
is it becoz the parallel_sort uses lock and unlock thereby increasing time??what is the size of the array that parallel_sort gives considerable speedup??
thankyou

I'd guess that either your array is too small, or you use an old version of TBB, assome time ago we improved performance of parallel_sort.
Parallel_sort does not explicitly use locks, but still there is some overhead on enabling parallelism. The grain size for parallel execution is 500, which means that smaller chunks are processed serially (by calling std::sort). Thus to give you any benefit, parallel_sort should operate with an array of at least 500 elements, and to load all P available cores an array bigger than P*500/2 is required (i.e. >1000 elements for 4 cores, etc.).

View solution in original post

Reply