- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Attached is a parallel merge sort that is substantially faster than TBB's parallel_sort - it uses parallel_reduce. The speed up comes from using Intel's IPP to do the sort. ParallelMerge (provided with TBB) merges the sorted data from each thread using parallel_reduce.
The sort works all the time with range size = height / (# of processors) and this is probably the most efficient setting for the tile size. However, in testing the smaller range sizes, the stack is corrupted in ParallelMerge. Parallel Studio reports that there is a race condition at the location of the crash. (Parallel Studio has been patched with Update 2 - the latest.)
To reproduce the problem, set TileSize = 1 in the attached code. It is failing on my 8-way Core i7 so TileSize = Height / 8 will work fine. This problem will not fail if TBB is initialized with only 1 thread (single-threaded mode.) The problem may also be circumvented by setting ParallelMerge's Is_Divisible method to always return false (but this leaves a lot of the speed up on the table.)
If anyone could help resolve this problem, itwill probablybenefit TBB or ParallelMerge. I don't believe the problem is in my code but I am willing to be test any suggestions.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The example worked just fine with TileSize set to 1 in main(). The only significant change I did was commenting out lines 49-50, where ParallelMergeRange
And Intel Parallel Inspector did not report data races in ParallelMerge, even at the highest level of analysis.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alexey - thanks for testing the code. The compiler error that you got is from the Intel compiler. The Microsoft compiler does not complain about the grainsize initializer.
I tested the Intel compiler on my machine and it still fails but it does fail differently. I got the error message below one time but most of the time it is failing in "start_for" with "my_range" all zeros. I checked with Inspector again and it detects a race condition on the same statement that is failing for the ICC compiler. Interesting and I am glad that Inspector is backing me up on this!
So switching compilers has changed the problem slightly but it is still failing. Since it is working on your machine, the best approach may be to compare carefully the exact versions we are both using and make sure there are no differences. Here is what Visual Studio (Help/About)says I am using for Parallel Studio Composer:
Intel Parallel Composer (Package ID: composer.061), Copyright 2002-2009 Intel Corporation
Intel Parallel Inspector Update 2, (build 75522), Copyright 2008-2009 Intel Corporation
I am using VS .NET Framework 3.5 with SP1.
I have neither TBB or IPP in my path (trying to keep things simple) and I have copied the TBB ia32 bin into Composer's IPP ia32 bin directory. The simplesort that I distributed points to this directory as the working directory so it shouldfind both IPP andTBB.
This is one of the trickier aspects of using all this software (TBB, IPP,ICC, VS libs)- finding any incompatibilities between any versions. I would appreciate hearing that you are using the identical TBB/IPP and VS release. (The code is making extensive use of MS libraries for standard template merge.)
In addition, it might be useful for you to recompile with MSVC++ but I am expecting that will work for you as well. The problem should be in some version difference.
---------------------------------------------------------------------------------------------------
Windows has triggered a breakpoint in SimpleSort.exe.
This may be due to a corruption of the heap, which indicates a bug in SimpleSort.exe or any of the DLLs it has loaded.
This may also be due to the user pressing F12 while SimpleSort.exe has focus.
The output window may have more diagnostic information.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was able to get some Intel Parallel Inspector data race diagnostics for the code. However, I believe those are false positives. I eliminated the diagnostics by the following changes:
-Removing tbbmalloc DLLs from the working directory;
-Replacing calls to ippsMalloc_32f(SIZE)with _aligned_malloc( SIZE*sizeof(Ipp32f), 64) in the assumption thatcache line size alignment is enough for IPP to work right.
Effectively the changes just replace the TBB and IPP memory allocation routines with "standard" MSVCRT heap allocation. The reason why the diagnostics disappear is that malloc & Co use some synchronization calls which Inspector is able to intercept and recognize as such, while the TBB and (I assume) IPP allocators use some custom synchronization which Inspector is unaware about. You might try the same experiment and see if the diagnostics are gone. If the crash you saw will go as well, then it must be a bug in one of our allocators.
P.S. I know we should make the TBB allocator "talking" to Inspector and not inducing false positives. It's in plans.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alexey - thanks for your response. I will change the alloc's as you suggested and try again. I will post with the results.
Bob Davies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alexey - I upgraded to TBB 2.2 as you suggested in another post and it fixes the problem with ParallelMerge. Thanks for the suggestion. The malloc's did not make a difference. This was a problem with TBB and it is now fixed.
Bob Davies
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page