Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
2421 Discussions

When is parallel_reduce 's join function called?

This is from tutorial:

class SumFoo {

float* my_a;


float my_sum;

void operator()( const blocked_range& r ) {

float *a = my_a;

float sum = my_sum;

size_t end = r.end();

for ( size_t i=r.begin(); i!=end; ++i )

sum += a;

my_sum = sum;


SumFoo( SumFoo& x, split ) : my_a(x.my_a), my_sum(0.0f) {}

void join( const SumFoo& y ) {

std::cout<<<" join "<<&y<<" "<<<" "<<<:ENDL>



SumFoo(float* a ) :

my_a(a), my_sum(0)



float ParallelSumFoo( float* a, size_t n ) {

SumFoo sf(a);

parallel_reduce( blocked_range(0,n), sf );

return sf.my_sum;


As you can see, I put a cout line in join function. I also tried to play with the my_sum. But when I run this program, the cout line never shows up. And no matter what I play with the my_sum+=y.my_sum/2; or even commend it out, the result is still the same.

So when, how is join function used here?

0 Kudos
3 Replies
Black Belt
You probably have too little work to give the algorithm a chance to exploit parallelism (although I should check the implementation to be sure about the details), so try to increase n (maybe a lot), and perhaps also trace and/or count the splitting constructor to see what it does. The Reference Manual shows a few example scenarios with fewer body splits than range splits.
I don't think this is the reason. The example is from TBB's tutorial. I used very large n, but it doesn't show any difference.
When I put a cout in the operator() function, I can see the different range gets executed. But it seems the join is never called, which is very strange to me.
Black Belt
Do you see a body being split? Do you see the operator() execute the subranges in order (probably no parallelism) or out of order (definitely parallelism)? How many hardware threads does the machine have, and are you allowing them to be used by TBB? What happens if you use task_scheduler_init in main() and vary the number of threads? Parallelism requires body split/join to occur, and you should be able to observe it.

(Edited 07:15Z.)