topic Problems with parallel_sort in Intel® Moderncode for Parallel Architectures

Problems with parallel_sort

usamytch — Sat, 06 Nov 2010 20:03:13 GMT

Good day, colleagues!

I'm newbie in concurrent programming and I've encountered a problem with parallel_sort. Currently I'm writing small program which is have to sort big binary files with limited amount of memory.

At the first step, I'm reading file to be sorted, split file to chunks (for example 10 MB each) and sort each chunk. The problem is when I'm applying parallel_sort to chunk, it performs more than 3 times slower than std::sort. Could you advice me, what I'm doing wrong? Thank you in advance.

Code is attached.
My machine is Core i7 860, compiler - Visual Studio 2010.

Problems with parallel_sort

usamytch — Sun, 07 Nov 2010 08:17:47 GMT

Colleagues,

may be my code was too tagled. I've created a new simple code where I just create a vector and concurrent vector (both are 1M integers) and sort them via std::sort and tbb::parallel_sort respectively. Running times are 1500 and 8000 CPU clocks respectively - std::sort is 5 times faster.

What is the problem in my code?

#include
#include
#include
#include "tbb\parallel_sort.h"
#include "tbb\concurrent_vector.h"
#include

using std::vector;
using tbb::concurrent_vector;
using tbb::parallel_sort;

const int SIZE = 1000000;

void Generate_Vector (int size, vector * target) {
target->resize(size);
for (int index = 0; index < size; ++index) {
target->at(index) = rand();
}
}

int main () {
srand (300);
vector serial;
Generate_Vector(SIZE, &serial);
concurrent_vector parallel (serial.begin(), serial.end());

clock_t start, finish;

start = clock();
std::sort(serial.begin(), serial.end());
finish = clock();

std::cout << "std::sort time is " << finish - start << std::endl;

start = clock();
tbb::parallel_sort (parallel.begin(), parallel.end());
finish = clock();

std::cout << "parallel sort time is " << finish - start << std::endl;

return 0;
}

Problems with parallel_sort

Dmitry_Vyukov — Sun, 07 Nov 2010 11:13:47 GMT

What is the reason for usage of tbb::concurrent_vector?

Problems with parallel_sort

usamytch — Sun, 07 Nov 2010 11:28:29 GMT

First I've tried to use std::vector, but it worked even slower, and CPU load was only 20-40% while with concurrent_vector it was 100%.

Important update - all results above were derived from Debug configuration. When I switched to Release and used std::vector, all become OK - CPU times was 78 for std::sort and 26 for tbb::parallel sort.

Problems with parallel_sort

Dmitry_Vyukov — Sun, 07 Nov 2010 12:06:44 GMT

Debug versions of STL have a LOT of additional non-scalable checks. For a example an STL container can have a mutex-protected sub-container of all iterators to into it, since it's mutex-protected, it's non-scalable.
If you are using MSVC try define:
# define _SECURE_SCL 0
# define _HAS_ITERATOR_DEBUGGING 0
# define _ITERATOR_DEBUG_LEVEL 0