Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

TBB and recursion

adrael
Beginner
1,953 Views
Hi,

I am parallelizing a recursive code. But I wonder how to do it because the simple templates of TBB are iterative ones parallel_for, parallel_reduce...).

I have something like that :
void class_A::compute_fcn(){
//Computation
for(int i =0 ; i<... ; i++){
//Do something necessary for the code after loop for
}
if(this->father!=NULL)
this->father->compute_fcn();
}


where father is a member of class_A. How should I parallelize that ?


Thanks.


0 Kudos
14 Replies
Alexey-Kukanov
Employee
1,953 Views

It depends on where the actual parallelism is.

  • If there is enough work in the computation section for each element, you might parallelize it, possibly with parallel_for, as your simplifies example suggests. Also parallelism at this level can be combined with either of the next two items that represent higher level parallelism.
  • It seems from the comments that your recursive call depends on the above loop computation; if so, no parallelism possible at that level. If father->compute_fcn would not depend on the loop, you might spawn it as a separate TBB task even before starting the loop calculations.
  • It is unclear how a higher level is organized where you start calling compute_fcn. In case there are multiple class_A objects that could be processed independently, you might use parallel_do; the initial set of objects should be provided via a pair of iterators, and processing "fathers" can be scheduled with parallel_do_feeder::add.

If none of the above is applicable, e.g. you start with just one class_A object, and there is not enough work in the computation loop to justify overhead, and the computation for father is dependent, then there is no parallelism unless you put additional efforts to rethink/redesign your algorithm and/or data structures.

0 Kudos
adrael
Beginner
1,953 Views
Ok. It seems that I am in the case where my recursive call depends on the above loop computation.
So if we simplify the situation by getting rid of this computation loop. We have now a basic recursive pattern :

void class_A::compute_fcn(){
// Do something
if(this->father!=NULL)
this->father->compute_fcn();
}

1) I should here create a task that spawns a child task until my criteria to stop ?

2) But what if the compute_fcn is virtual ?
Here is the sitation :

Tree ----> Internal_Nodes ----> Leaf_Nodes
virtual compute_fcn(); virtual compute_fcn(); virtual compute_fcn();


There will be different implementations for the same definition. How do I handle with tasks this situation ?




0 Kudos
Alexey-Kukanov
Employee
1,953 Views

If there were multiple recursive calls done in compute_fcn and those were independent, then yes, you would create a task that spawns child tasks. But in this simplified recursive pattern, there is no parallelism at the inner level because each call is followed by 0 or 1 recursive call and there is no work overlap. So your only hope for parallelism is the outer level.

Your second question alludes that you might think of combining the functionality of tbb::task and class_A (and descendants) together in a single class. I think itis a bad idea, because tbb tasks are allocated via a special mechanism and destroyed right after execution - something hardly ever appropriate for your classes. You could use a single taskclass that accepts and keeps a pointer or a reference to a class_A object; there would be no problem with compute_fcn being virtual as the call would be dispatched as necessary via the pointer/reference.

0 Kudos
adrael
Beginner
1,953 Views
MADakukanov:

If there were multiple recursive calls done in compute_fcn and those were independent, then yes, you would create a task that spawns child tasks. But in this simplified recursive pattern, there is no parallelism at the inner level because each call is followed by 0 or 1 recursive call and there is no work overlap. So your only hope for parallelism is the outer level.

Just a precision : what do you mean by work overlap ? If the caller has to do some work before calling recursively the fcn like my sample code in my 2nd post (//do something), is it overlap ?

By outer level, you mean that I should parallelize at the first recursive call ?


Thanks for your explanations Alexey.

0 Kudos
Alexey-Kukanov
Employee
1,953 Views

If there were some work to do after spawning the current recursive call into a separate task, and independent on the result of the call so can be executed concurrently with it, there would be overlap. In your examples, there is no overlap.

Yes, by outer level I mean where the first recursive call is done, assuming that there might be similar calls done simultaneously for different objects, and that those could run independently or with reasonable amount of synchronization.

0 Kudos
adrael
Beginner
1,953 Views
Ok so my only chance is to parallelize the outer level. Infact, I often have the same pattern :


void fcn(arg list) {
 ....
 for(i=0;i
  //Computation
   fcn_B(arg2 list);
 }
}

or, the recursive version :

void fcn(arg list) {
 ....
 for(i=0;i
  //Computation
   fcn_A(arg2 list);
 }
}



Here I should parallelize the loop for, shouldn't I ? The problem is the size of the range is not large : less than 5.

0 Kudos
Alexey-Kukanov
Employee
1,953 Views
Still it is better than nothing. Also you might investigate whether the computations within class_A::compute_fcn are worth parallelizing, and whether the algorithm can be restructured to have less dependencies between calculation. Or you can even investigate if there is a different approach to the same problem that can be parallelized more efficiently.
0 Kudos
adrael
Beginner
1,953 Views
Ok thanks a lot for your help.
0 Kudos
ARCH_R_Intel
Employee
1,953 Views

As a point of reference, fcn_A needs to be about 10,000 cycles or more for parallelization with TBB to pay off; i.e., to amortize the parallel scheduler overheads.

As Alexey remarks, take a look at your dependences and see if they can be broken or rearranged. A quick introduction to dependences and rearranging them can be found on slides 5-17 of Ubiquitous Multithreading-OSCON2008.ppt .

0 Kudos
RafSchietekat
Valued Contributor III
1,953 Views
Aren't these .ppt files encrypted (that's what OpenOffice says, anyway)?
0 Kudos
adrael
Beginner
1,953 Views
OK. But what if my parallel_for are recursive ? Here is my situation :


void fcn(arg list) {
 ....
 for(i=0;i
  //Computation
   fcn(arg2 list);
 }

// And now do something
}

Here my parallel_for willl be called each time fcn is called.
It might lower the performance ?



void fcn(arg list) {
 ....
 parallel_for(.....)


// And now do something
}
0 Kudos
Wooyoung_K_Intel
Employee
1,953 Views

If you want to do some kind of tree traversal, and if you plan to create tasks using parallel_for for the first few levels and to use recursion with'for'below the threshold level, your approach may work, even though it may not be the most optimal solution. For example, if your tree is lop-sized, you get load-imbalance which results in sub-linear performance gain.

If your tree traversal is to find elements in the tree that satisfy certain conditions, pleasecheck out TBB cancellation and see if it is appliacable to your application.

0 Kudos
Andrey_Marochko
New Contributor III
1,953 Views
Just in case if the concern was about nested parallelism overhead. You can safely nest TBB's parallel algorithms inside one another. This virtually does not incur any additional overhead. You just need to make sure that your leaf tasks are meaty enough.
0 Kudos
ARCH_R_Intel
Employee
1,953 Views

Raf_Schietekat:
Aren't these .ppt files encrypted (that's what OpenOffice says, anyway)?

The encrypted .ppt has been replaced by a .pdf file.

0 Kudos
Reply