Community
cancel
Showing results for 
Search instead for 
Did you mean: 
missing__zlw
Beginner
40 Views

When is parallel_reduce 's join function called?

This is from tutorial:

class SumFoo {

float* my_a;

public:

float my_sum;

void operator()( const blocked_range& r ) {

float *a = my_a;

float sum = my_sum;

size_t end = r.end();

for ( size_t i=r.begin(); i!=end; ++i )

sum += a;

my_sum = sum;

}

SumFoo( SumFoo& x, split ) : my_a(x.my_a), my_sum(0.0f) {}

void join( const SumFoo& y ) {

std::cout<<<" join "<<&y<<" "<<<" "<<<:ENDL>

my_sum+=y.my_sum/2;

}

SumFoo(float* a ) :

my_a(a), my_sum(0)

{}

};

float ParallelSumFoo( float* a, size_t n ) {

SumFoo sf(a);

parallel_reduce( blocked_range(0,n), sf );

return sf.my_sum;

}

As you can see, I put a cout line in join function. I also tried to play with the my_sum. But when I run this program, the cout line never shows up. And no matter what I play with the my_sum+=y.my_sum/2; or even commend it out, the result is still the same.

So when, how is join function used here?

0 Kudos
3 Replies
RafSchietekat
Black Belt
40 Views

You probably have too little work to give the algorithm a chance to exploit parallelism (although I should check the implementation to be sure about the details), so try to increase n (maybe a lot), and perhaps also trace and/or count the splitting constructor to see what it does. The Reference Manual shows a few example scenarios with fewer body splits than range splits.
missing__zlw
Beginner
40 Views

I don't think this is the reason. The example is from TBB's tutorial. I used very large n, but it doesn't show any difference.
When I put a cout in the operator() function, I can see the different range gets executed. But it seems the join is never called, which is very strange to me.
RafSchietekat
Black Belt
40 Views

Do you see a body being split? Do you see the operator() execute the subranges in order (probably no parallelism) or out of order (definitely parallelism)? How many hardware threads does the machine have, and are you allowing them to be used by TBB? What happens if you use task_scheduler_init in main() and vary the number of threads? Parallelism requires body split/join to occur, and you should be able to observe it.

(Edited 07:15Z.)
Reply