Community
cancel
Showing results for 
Search instead for 
Did you mean: 
ROBERT_D_Intel1
Employee
65 Views

Race Condition in ParallelMerge?

Attached is a parallel merge sort that is substantially faster than TBB's parallel_sort - it uses parallel_reduce. The speed up comes from using Intel's IPP to do the sort. ParallelMerge (provided with TBB) merges the sorted data from each thread using parallel_reduce.

The sort works all the time with range size = height / (# of processors) and this is probably the most efficient setting for the tile size. However, in testing the smaller range sizes, the stack is corrupted in ParallelMerge. Parallel Studio reports that there is a race condition at the location of the crash. (Parallel Studio has been patched with Update 2 - the latest.)

To reproduce the problem, set TileSize = 1 in the attached code. It is failing on my 8-way Core i7 so TileSize = Height / 8 will work fine. This problem will not fail if TBB is initialized with only 1 thread (single-threaded mode.) The problem may also be circumvented by setting ParallelMerge's Is_Divisible method to always return false (but this leaves a lot of the speed up on the table.)

If anyone could help resolve this problem, itwill probablybenefit TBB or ParallelMerge. I don't believe the problem is in my code but I am willing to be test any suggestions.

0 Kudos
5 Replies
Alexey_K_Intel3
Employee
65 Views

Bob, I tried to reproduce the problem on my machine (8 cores, VS 2005, Intel Parallel Studio Update 2) but it did not show up.
The example worked just fine with TileSize set to 1 in main(). The only significant change I did was commenting out lines 49-50, where ParallelMergeRange::grainsize was initialized. The reason for that was compile-time error, due to grainsize being declared (and initialized) as static const data member in the class.
And Intel Parallel Inspector did not report data races in ParallelMerge, even at the highest level of analysis.
ROBERT_D_Intel1
Employee
65 Views

Alexey - thanks for testing the code. The compiler error that you got is from the Intel compiler. The Microsoft compiler does not complain about the grainsize initializer.

I tested the Intel compiler on my machine and it still fails but it does fail differently. I got the error message below one time but most of the time it is failing in "start_for" with "my_range" all zeros. I checked with Inspector again and it detects a race condition on the same statement that is failing for the ICC compiler. Interesting and I am glad that Inspector is backing me up on this!

So switching compilers has changed the problem slightly but it is still failing. Since it is working on your machine, the best approach may be to compare carefully the exact versions we are both using and make sure there are no differences. Here is what Visual Studio (Help/About)says I am using for Parallel Studio Composer:

Intel Parallel Composer (Package ID: composer.061), Copyright 2002-2009 Intel Corporation
Intel Parallel Inspector Update 2, (build 75522), Copyright 2008-2009 Intel Corporation

I am using VS .NET Framework 3.5 with SP1.

I have neither TBB or IPP in my path (trying to keep things simple) and I have copied the TBB ia32 bin into Composer's IPP ia32 bin directory. The simplesort that I distributed points to this directory as the working directory so it shouldfind both IPP andTBB.

This is one of the trickier aspects of using all this software (TBB, IPP,ICC, VS libs)- finding any incompatibilities between any versions. I would appreciate hearing that you are using the identical TBB/IPP and VS release. (The code is making extensive use of MS libraries for standard template merge.)

In addition, it might be useful for you to recompile with MSVC++ but I am expecting that will work for you as well. The problem should be in some version difference.

---------------------------------------------------------------------------------------------------
Windows has triggered a breakpoint in SimpleSort.exe.

This may be due to a corruption of the heap, which indicates a bug in SimpleSort.exe or any of the DLLs it has loaded.

This may also be due to the user pressing F12 while SimpleSort.exe has focus.

The output window may have more diagnostic information.

Alexey_K_Intel3
Employee
65 Views

I triedthe testa few more times in different environments (VS 2005, VS 2008, Intel Compiler 11.0, Intel Parallel Composer update 3) but did not sawthe crash you described. I did not try it with non-updated Composer (package 061), though.

I was able to get some Intel Parallel Inspector data race diagnostics for the code. However, I believe those are false positives. I eliminated the diagnostics by the following changes:

-Removing tbbmalloc DLLs from the working directory;
-Replacing calls to ippsMalloc_32f(SIZE)with _aligned_malloc( SIZE*sizeof(Ipp32f), 64) in the assumption thatcache line size alignment is enough for IPP to work right.

Effectively the changes just replace the TBB and IPP memory allocation routines with "standard" MSVCRT heap allocation. The reason why the diagnostics disappear is that malloc & Co use some synchronization calls which Inspector is able to intercept and recognize as such, while the TBB and (I assume) IPP allocators use some custom synchronization which Inspector is unaware about. You might try the same experiment and see if the diagnostics are gone. If the crash you saw will go as well, then it must be a bug in one of our allocators.

P.S. I know we should make the TBB allocator "talking" to Inspector and not inducing false positives. It's in plans.
ROBERT_D_Intel1
Employee
65 Views

Alexey - thanks for your response. I will change the alloc's as you suggested and try again. I will post with the results.

Bob Davies

ROBERT_D_Intel1
Employee
65 Views

Alexey - I upgraded to TBB 2.2 as you suggested in another post and it fixes the problem with ParallelMerge. Thanks for the suggestion. The malloc's did not make a difference. This was a problem with TBB and it is now fixed.

Bob Davies

Reply