- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm reading the O'Reilly book on TBB. In Figure 3-4 (p. 42), there is a diagram that shows a possible execution path for parallel_reduce. From the figure and the surrounding paragraphs, it's my understanding that all of the merges *should* take place strictly from left to right (associatively) without any gaps in the ranges that are being joined. However, parallel_reduce seems to be joining non-adjacent ranges ( i.e.- [0,100) joined with [900,1000) ). Here is some sample code that demonstrates my problem:
#include
#include "tbb/task_scheduler_init.h"
#include "tbb/parallel_reduce.h"
#include "tbb/blocked_range.h"
using namespace std;
using namespace tbb;
void ParallelReduceTest(int a[], size_t n);
class Subtraction
{
private:
int* const my_a;
public:
int diff; // running difference
int high, low; // half-open range boundaries
void operator() (const blocked_range& r)
{
int *a = my_a;
for (size_t i=r.begin(); i != r.end(); ++i)
{
diff -= a;
}
// Set the range boundaries
low = r.begin();
high = r.end();
}
Subtraction (int a[]) : my_a(a), diff(0) {}
Subtraction (Subtraction& x, split) : my_a(x.my_a), diff(0) {}
void join (const Subtraction& y)
{
if (high != y.low)
{
cerr << "Joining non-adjacent ranges: ";
cerr << "[" << low << ", " << high << ") + [" << y.low << ", " << y.high << ")" << endl;
}
high = y.high;
diff += y.diff;
}
};
int main()
{
task_scheduler_init init;
size_t size = 5000;
int* a = new int[size];
for (size_t i=0; i a = i;
ParallelReduceTest(a, size);
return 0;
}
void ParallelReduceTest(int a[], size_t n)
{
Subtraction S(a);
parallel_reduce (blocked_range(0,n,100), S);
cout << "Difference is " << S.diff << endl;
}
Here is a typical output:
Joining non-adjacent ranges: [859, 937) + [1171, 1250)
Joining non-adjacent ranges: [859, 1250) + [2421, 2500)
Joining non-adjacent ranges: [859, 2500) + [4921, 5000)
Difference is -12497500
For this particular algorithm, you'll get the same final result. Ho wever, I'm working on a more complex algorithm that requires that data be joined associatively.
Am I misunderstanding how parallel_reduce works or is this a bug?
I'm running Ubuntu 8.04 (Alpha 6) and using TBB 2.0r014-3.
#include
#include "tbb/task_scheduler_init.h"
#include "tbb/parallel_reduce.h"
#include "tbb/blocked_range.h"
using namespace std;
using namespace tbb;
void ParallelReduceTest(int a[], size_t n);
class Subtraction
{
private:
int* const my_a;
public:
int diff; // running difference
int high, low; // half-open range boundaries
void operator() (const blocked_range
{
int *a = my_a;
for (size_t i=r.begin(); i != r.end(); ++i)
{
diff -= a;
}
// Set the range boundaries
low = r.begin();
high = r.end();
}
Subtraction (int a[]) : my_a(a), diff(0) {}
Subtraction (Subtraction& x, split) : my_a(x.my_a), diff(0) {}
void join (const Subtraction& y)
{
if (high != y.low)
{
cerr << "Joining non-adjacent ranges: ";
cerr << "[" << low << ", " << high << ") + [" << y.low << ", " << y.high << ")" << endl;
}
high = y.high;
diff += y.diff;
}
};
int main()
{
task_scheduler_init init;
size_t size = 5000;
int* a = new int[size];
for (size_t i=0; i
ParallelReduceTest(a, size);
return 0;
}
void ParallelReduceTest(int a[], size_t n)
{
Subtraction S(a);
parallel_reduce (blocked_range
cout << "Difference is " << S.diff << endl;
}
Here is a typical output:
Joining non-adjacent ranges: [859, 937) + [1171, 1250)
Joining non-adjacent ranges: [859, 1250) + [2421, 2500)
Joining non-adjacent ranges: [859, 2500) + [4921, 5000)
Difference is -12497500
For this particular algorithm, you'll get the same final result. Ho wever, I'm working on a more complex algorithm that requires that data be joined associatively.
Am I misunderstanding how parallel_reduce works or is this a bug?
I'm running Ubuntu 8.04 (Alpha 6) and using TBB 2.0r014-3.
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your check for adjacency is incorrect. You did not take into account that parallel_reduce reuses the same body object if the next range it works on is adjacent to the previous one. In operator(), you overwrite low each time, instead of comparing it with the current value of high and "merging" into the overall processed range. Obviously you have just lost the information about ranges processed earlier.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You're right! Thanks a lot for your help.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page