I'm newbie in concurrent programming and I've encountered a problem with parallel_sort. Currently I'm writing small program which is have to sort big binary files with limited amount of memory.
At the first step, I'm reading file to be sorted, split file to chunks (for example 10 MB each) and sort each chunk. The problem is when I'm applying parallel_sort to chunk, it performs more than 3 times slower than std::sort. Could you advice me, what I'm doing wrong? Thank you in advance.
Code is attached. My machine is Core i7 860, compiler - Visual Studio 2010.
may be my code was too tagled. I've created a new simple code where I just create a vector and concurrent vector (both are 1M integers) and sort them via std::sort and tbb::parallel_sort respectively. Running times are 1500 and 8000 CPU clocks respectively - std::sort is 5 times faster.
First I've tried to use std::vector, but it worked even slower, and CPU load was only 20-40% while with concurrent_vector it was 100%.
Important update - all results above were derived from Debug configuration. When I switched to Release and used std::vector, all become OK - CPU times was 78 for std::sort and 26 for tbb::parallel sort.
Debug versions of STL have a LOT of additional non-scalable checks. For a example an STL container can have a mutex-protected sub-container of all iterators to into it, since it's mutex-protected, it's non-scalable. If you are using MSVC try define: # define _SECURE_SCL 0 # define _HAS_ITERATOR_DEBUGGING 0 # define _ITERATOR_DEBUG_LEVEL 0