Attached is a parallel merge sort that is substantially faster than TBB's parallel_sort - it uses parallel_reduce. The speed up comes from using Intel's IPP to do the sort. ParallelMerge (provided with TBB) merges the sorted data from each thread using parallel_reduce.
The sort works all the time with range size = height / (# of processors) and this is probably the most efficient setting for the tile size. However, in testing the smaller range sizes, the stack is corrupted in ParallelMerge. Parallel Studio reports that there is a race condition at the location of the crash. (Parallel Studio has been patched with Update 2 - the latest.)
To reproduce the problem, set TileSize = 1 in the attached code. It is failing on my 8-way Core i7 so TileSize = Height / 8 will work fine. This problem will not fail if TBB is initialized with only 1 thread (single-threaded mode.) The problem may also be circumvented by setting ParallelMerge's Is_Divisible method to always return false (but this leaves a lot of the speed up on the table.)
If anyone could help resolve this problem, itwill probablybenefit TBB or ParallelMerge. I don't believe the problem is in my code but I am willing to be test any suggestions.
Alexey - thanks for testing the code. The compiler error that you got is from the Intel compiler. The Microsoft compiler does not complain about the grainsize initializer.
I tested the Intel compiler on my machine and it still fails but it does fail differently. I got the error message below one time but most of the time it is failing in "start_for" with "my_range" all zeros. I checked with Inspector again and it detects a race condition on the same statement that is failing for the ICC compiler. Interesting and I am glad that Inspector is backing me up on this!
So switching compilers has changed the problem slightly but it is still failing. Since it is working on your machine, the best approach may be to compare carefully the exact versions we are both using and make sure there are no differences. Here is what Visual Studio (Help/About)says I am using for Parallel Studio Composer:
Intel Parallel Composer (Package ID: composer.061), Copyright 2002-2009 Intel Corporation
Intel Parallel Inspector Update 2, (build 75522), Copyright 2008-2009 Intel Corporation
I am using VS .NET Framework 3.5 with SP1.
I have neither TBB or IPP in my path (trying to keep things simple) and I have copied the TBB ia32 bin into Composer's IPP ia32 bin directory. The simplesort that I distributed points to this directory as the working directory so it shouldfind both IPP andTBB.
This is one of the trickier aspects of using all this software (TBB, IPP,ICC, VS libs)- finding any incompatibilities between any versions. I would appreciate hearing that you are using the identical TBB/IPP and VS release. (The code is making extensive use of MS libraries for standard template merge.)
In addition, it might be useful for you to recompile with MSVC++ but I am expecting that will work for you as well. The problem should be in some version difference.
Windows has triggered a breakpoint in SimpleSort.exe.
This may be due to a corruption of the heap, which indicates a bug in SimpleSort.exe or any of the DLLs it has loaded.
This may also be due to the user pressing F12 while SimpleSort.exe has focus.
The output window may have more diagnostic information.
Alexey - I upgraded to TBB 2.2 as you suggested in another post and it fixes the problem with ParallelMerge. Thanks for the suggestion. The malloc's did not make a difference. This was a problem with TBB and it is now fixed.