Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

parallel_reduce - associative problem

I'm reading the O'Reilly book on TBB. In Figure 3-4 (p. 42), there is a diagram that shows a possible execution path for parallel_reduce. From the figure and the surrounding paragraphs, it's my understanding that all of the merges *should* take place strictly from left to right (associatively) without any gaps in the ranges that are being joined. However, parallel_reduce seems to be joining non-adjacent ranges ( i.e.- [0,100) joined with [900,1000) ). Here is some sample code that demonstrates my problem:

#include "tbb/task_scheduler_init.h"
#include "tbb/parallel_reduce.h"
#include "tbb/blocked_range.h"
using namespace std;
using namespace tbb;

void ParallelReduceTest(int a[], size_t n);

class Subtraction
int* const my_a;
int diff; // running difference
int high, low; // half-open range boundaries
void operator() (const blocked_range& r)
int *a = my_a;
for (size_t i=r.begin(); i != r.end(); ++i)
diff -= a;

// Set the range boundaries
low = r.begin();
high = r.end();
Subtraction (int a[]) : my_a(a), diff(0) {}
Subtraction (Subtraction& x, split) : my_a(x.my_a), diff(0) {}
void join (const Subtraction& y)
if (high != y.low)
cerr << "Joining non-adjacent ranges: ";
cerr << "[" << low << ", " << high << ") + [" << y.low << ", " << y.high << ")" << endl;
high = y.high;
diff += y.diff;

int main()
task_scheduler_init init;

size_t size = 5000;
int* a = new int[size];
for (size_t i=0; i a = i;

ParallelReduceTest(a, size);

return 0;

void ParallelReduceTest(int a[], size_t n)
Subtraction S(a);
parallel_reduce (blocked_range(0,n,100), S);
cout << "Difference is " << S.diff << endl;

Here is a typical output:

Joining non-adjacent ranges: [859, 937) + [1171, 1250)
Joining non-adjacent ranges: [859, 1250) + [2421, 2500)
Joining non-adjacent ranges: [859, 2500) + [4921, 5000)
Difference is -12497500

For this particular algorithm, you'll get the same final result. Ho wever, I'm working on a more complex algorithm that requires that data be joined associatively.

Am I misunderstanding how parallel_reduce works or is this a bug?

I'm running Ubuntu 8.04 (Alpha 6) and using TBB 2.0r014-3.
0 Kudos
2 Replies
Your check for adjacency is incorrect. You did not take into account that parallel_reduce reuses the same body object if the next range it works on is adjacent to the previous one. In operator(), you overwrite low each time, instead of comparing it with the current value of high and "merging" into the overall processed range. Obviously you have just lost the information about ranges processed earlier.
0 Kudos
You're right! Thanks a lot for your help.
0 Kudos