Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2465 Discussions

When is parallel_reduce 's join function called?

missing__zlw
Beginner
241 Views
This is from tutorial:

class SumFoo {

float* my_a;

public:

float my_sum;

void operator()( const blocked_range& r ) {

float *a = my_a;

float sum = my_sum;

size_t end = r.end();

for ( size_t i=r.begin(); i!=end; ++i )

sum += a;

my_sum = sum;

}

SumFoo( SumFoo& x, split ) : my_a(x.my_a), my_sum(0.0f) {}

void join( const SumFoo& y ) {

std::cout<<<" join "<<&y<<" "<<<" "<<<:ENDL>

my_sum+=y.my_sum/2;

}

SumFoo(float* a ) :

my_a(a), my_sum(0)

{}

};

float ParallelSumFoo( float* a, size_t n ) {

SumFoo sf(a);

parallel_reduce( blocked_range(0,n), sf );

return sf.my_sum;

}

As you can see, I put a cout line in join function. I also tried to play with the my_sum. But when I run this program, the cout line never shows up. And no matter what I play with the my_sum+=y.my_sum/2; or even commend it out, the result is still the same.

So when, how is join function used here?

0 Kudos
3 Replies
RafSchietekat
Valued Contributor III
241 Views
You probably have too little work to give the algorithm a chance to exploit parallelism (although I should check the implementation to be sure about the details), so try to increase n (maybe a lot), and perhaps also trace and/or count the splitting constructor to see what it does. The Reference Manual shows a few example scenarios with fewer body splits than range splits.
0 Kudos
missing__zlw
Beginner
241 Views
I don't think this is the reason. The example is from TBB's tutorial. I used very large n, but it doesn't show any difference.
When I put a cout in the operator() function, I can see the different range gets executed. But it seems the join is never called, which is very strange to me.
0 Kudos
RafSchietekat
Valued Contributor III
241 Views
Do you see a body being split? Do you see the operator() execute the subranges in order (probably no parallelism) or out of order (definitely parallelism)? How many hardware threads does the machine have, and are you allowing them to be used by TBB? What happens if you use task_scheduler_init in main() and vary the number of threads? Parallelism requires body split/join to occur, and you should be able to observe it.

(Edited 07:15Z.)
0 Kudos
Reply