Community
cancel
Showing results for 
Search instead for 
Did you mean: 
jdonner
Beginner
70 Views

Task tree with disappearing internal nodes?

Hi,

I want to make a task tree where the parent nodes exist only to spawn children (they do a little bit of calculation and spawn only certain children). This is for branch-and-bound code with globally-accessible bounds. A tree structure is natural, and the depth-based scheduling is desirable, but there's no reason to keep the parent nodes around once they've spawned their children. If you do allocate_root() and spawn_root_and_wait() on the children, the parent still has to wait for the rootlike-children to finish, right? Is there any other way to do what I want?

Thanks,
J Donner
0 Kudos
5 Replies
RafSchietekat
Black Belt
70 Views

"Is there any other way to do what I want?" But what do you want? What specific reason do you have for not wanting to keep the parents around? Are you certain that this is not premature optimisation?

You might make the tasks children of a long-lived task, but the dependence on a shared reference count might harm performance. You might let a parent take on the assignment of one of the children before joining with the rest of them, which seems generally a better choice (go have a look at parallel_for). The absolute best choice probably depends on the exact circumstances, but finding it may not be where you should direct your attention.

Anton_Pegushin
New Contributor II
70 Views

Quoting - jdonner
Hi,

I want to make a task tree where the parent nodes exist only to spawn children (they do a little bit of calculation and spawn only certain children). This is for branch-and-bound code with globally-accessible bounds. A tree structure is natural, and the depth-based scheduling is desirable, but there's no reason to keep the parent nodes around once they've spawned their children. If you do allocate_root() and spawn_root_and_wait() on the children, the parent still has to wait for the rootlike-children to finish, right? Is there any other way to do what I want?

Thanks,
J Donner
Hi, I think I'm agreeing with Raf completely on this one. From the first impression this optimization you're looking for really seems like an overkill: since the bounds are globally accessible, your tasks objects don't occupy too much space, do they?
However this is doable, if you agree to change "disappearing parents" to "parents changing into empty placeholders" in the problem statement :). For how-to please look at "Recycling Parent As Child" (chapter 8.8.2.2) in the TBB Reference Manual or, as Raf mentioned earlier, look at the parallel_for start_for::execute() implementation. I'm guessing your Parents tasks are of the same type as Children tasks, right? So it'll work then.
jdonner
Beginner
70 Views

Hi, I think I'm agreeing with Raf completely on this one. From the first impression this optimization you're looking for really seems like an overkill: since the bounds are globally accessible, your tasks objects don't occupy too much space, do they?
However this is doable, if you agree to change "disappearing parents" to "parents changing into empty placeholders" in the problem statement :). For how-to please look at "Recycling Parent As Child" (chapter 8.8.2.2) in the TBB Reference Manual or, as Raf mentioned earlier, look at the parallel_for start_for::execute() implementation. I'm guessing your Parents tasks are of the same type as Children tasks, right? So it'll work then.

Ok, I can do that (yes parents are the same as children). But still, won't the children possibly contend with their siblings to de-reference the parent, and invoke expensive protection operations even if not? (or is it lock-free somehow? Haven't checked). Also I was worried about stack space, if the search goes very deep, plus I wondered if I was missing something obvious. But, if you use the method you suggest in parallel_for itself, it obviously can't be too bad :) Though, that has only 2 children at a time working on a given parent, my code could have up to the number of processors. I guess I could try reducing the branching factor and see if it makes a difference. Well ok anyway, thanks, to you, and Raf.
RafSchietekat
Black Belt
70 Views

"But still, won't the children possibly contend with their siblings to de-reference the parent, and invoke expensive protection operations even if not? (or is it lock-free somehow? Haven't checked)." Decrementing the parents' reference count is done atomically, if that can reassure you (although it's not as if atomic operations are automatically much cheaper than locked operations).

"my code could have up to the number of processors" The idea of using tasks is to create opportunities for concurrency that the scheduler can exploit, not to second-guess or spoon-feed the scheduler. Why would you take the number of processors into account?

"I guess I could try reducing the branching factor and see if it makes a difference." Reducing the branching factor increases the relative cost of keeping a parent around, so this is not necessarily a better approach.
Anton_Pegushin
New Contributor II
70 Views

Quoting - jdonner
....Also I was worried about stack space, if the search goes very deep, plus I wondered if I was missing something obvious. But, if you use the method you suggest in parallel_for itself, it obviously can't be too bad :)....
Continuation technique _is_ the solution for stack space problem (stack overflow). In the so-called blocking mode parents execute() method stays in the thread call stack, so that thread can return to it, when done executing children's execute() member functions (note: blocking mode is when parents reference counter is equal to num_of_children + 1 and wait_for_all is called after children's been allocated and spawned). Thus blocking mode can cause stack overflow.
Continuation in turn _prevents_ this problem by destroying (or recycling) the parents task (exiting from it's execute() member function prior to that) without waiting for the children to complete. When children are done working "continuation" task will be executed.
Reply