Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2479 Discussions

How to implement mergesort with parallel_reduce?

sviiya
Beginner
771 Views
Hello,
I am trying to implement mergesort with parallel_reduce. Let us say that the range [ P, R ) is split into two:
[ P, Q ) and [Q, R )

I would like to recursively divide these mergesort these two intervals and then merge the two.

The problem is, the subtask does not seem to let the user access the ranges of the two tasks.

Now my question is, is it possible to implemet these really recursive functions like mergesort and quicksort with parallel_reduce? If so, could someone give me a hint to approach the problem please?

Thanks,
Sviiya
0 Kudos
8 Replies
Bartlomiej
New Contributor I
771 Views
You have a method join() required in the Body to combine the results of two recursive calls. It gets the second Body as the argument.
I haven't used parallel_reduce yet, but - as in the operator () a table must be computed, I think you have to embed the bounds of blocked_range in the Body.
0 Kudos
RafSchietekat
Valued Contributor III
771 Views
Quoting - sviiya
I am trying to implement mergesort with parallel_reduce.
Make sure to let us know how that turns out compared to parallel_sort().
0 Kudos
lts4life
Beginner
771 Views
This is a very interesting question, as mergesort is an example for a pure divide-and-conquer algorithm (divide, conquer and combine). Thus, in order to implement mergesort with TBB, I have looked into parallel_reduce. I came to the conclusion that the parallel_reduce algorithm is not suitable for mergesort as it only calls the 'join' method as often as it calls the splitting constructor. In a subsequent search I have found an article discussing this problem.
0 Kudos
ROBERT_D_Intel1
Employee
771 Views
its4life

Don't believe everything you read! You can do a merge sort with parallel reduce and I have attacheda simple sort that does exactly what you were describing.

However, the attached app exposes a problem with ParallelMerge (included). The app will work with the tile size = height / (# of processors). If you set it smaller (even to 1) it will fail occasionally. There is a race condition and I am working to resolve it but it should not affect the final result - a sort merge.

Note that the application uses Intel's IPP library to do the sort (much faster.) If you have Parallel Studio, the project should just recompile. This version is for 32-bit floats.

Bob Davies
0 Kudos
lts4life
Beginner
771 Views

Hello,

Thanks for your reply and for the provided example. I did not mean to say that it is not possible, that would be a very dubious claim! I just wanted to point out that it is not enough to merge in the join method alone like in this negative example:

[bash]// Incomplete, and DOES NOT work either!
struct merge_sort {
    
    int *array;
    size_t begin;
    size_t end;
   
// ...
void operator()(const blocked_range& range) { sort(array + range.begin(), array + range.end()); } void join(const merge_sort& y) { // Merging here is NOT sufficient! merge(array + begin, array + end, array + y.end); } };[/bash]


Input array [6 1 7 3 2 9 8 4]
Task A sorts [6 1] -> [1 6]
Task B sorts [2 9] -> [2 9]
Task A sorts [7 3] -> [3 7]
Task B sorts [8 4] -> [4 8]
A joins B [1 6 2 9] [3 7 4 8] -> [1 3 6 2 7 4 8 9]
and the resulting array is not sorted

I see that your solution solves this problem with an interesting approach by initiating a parallel merge within the task's body.

Again, thanks a lot.

0 Kudos
RafSchietekat
Valued Contributor III
771 Views
If you actually tried this example, you would probably see that you need inplace_merge() instead of merge() and that there would be two intervening steps that take care of merging [1 6][2 9]->[1 2 6 9] and [3 7][4 8]->[3 4 7 8] or some other sequence that would produce the correct result.
0 Kudos
lts4life
Beginner
771 Views
Thanks for pointing out. Yes, I actually implemented it using inplace_merge. I should have written that in the pseudo-code, too, considering that merge is already defined in the STL library. I was reluctant to post the complete sample code as it compiles but of course fails to do what it seems to be supposed to. The intervening steps to merge the partial results within the same task instance is what's missing and I think this is what Bob has accounted for in his solution above.
0 Kudos
RafSchietekat
Valued Contributor III
771 Views
"The intervening steps to merge the partial results within the same task instance is what's missing and I think this is what Bob has accounted for in his solution above."
Sorry, there was a problem unpacking that on my Linux O.S., and I didn't persevere, so I didn't examine that solution (if the whole world can handle .zip, why not be original and use .zipx instead...). But I now see what you intended to show: part of the merge responsibility falls on the aptly named "accumulation" step and must not be overlooked. Or so I think, because I have no idea what that "race condition" Bob mentioned might be?
0 Kudos
Reply