Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

recycle_as_continuation

oneminute
Beginner
367 Views
Just a quick question on something I can't pick up from the examples - recycle_as_continuation is supposed to prevent me from having to reallocate a task after every time it returns, correct? If so, where do I use it?

Example:

[cpp]while(isRunning)
{
	MyTask& myTask = *new(task::allocate_child()) MyTask();
	set_ref_count(2);

	spawn_and_wait_for_all(myTask);
}[/cpp]

So I want this code to continually spawn, return, spawn, return. This is just a simplified version of my code which waits on many more tasks before the loop is started again. I want to be able to continue to spawn the tasks without having to reallocate every time.

I tried putting recycle_as_continuation in the execute() of myTask but my system freezes up.

Any help is appreciated
0 Kudos
6 Replies
RafSchietekat
Valued Contributor III
367 Views
I would call that a bit oversimplified (no parallelism possible, just overhead), so I can't see what exactly you're trying to do, but perhaps it will work if you use recycle_as_child_of(*parent()) instead, because recycle_as_continuation() is about something else.
0 Kudos
Andrey_Marochko
New Contributor III
367 Views
You could have a look at the TBB's tree_sum example (in the "examples/task/tree_sum" folder). TBB headers also use this method in a few places, but they are part of the code with quite non-trivial logic, and therefore are likely not a good examples.

In general you use recycle_as_continuation as an optimization in the following scenario. You have a task that creates a few child tasks and then needs to execute some code after these new tasks are completed. The most starightforward way is to use one of the wait_for_all() methods to wait for the child tasks completion, and then to execure post-processing code in the same task. But direct usage of wait_for_all() from inside other tasks is generally a bad practice because it may lead to excessive call stack consumption and incurs additional overhead.

Instead you:
  1. Recycle the current task as continuation
  2. Allocate N children
  3. Set the current task's refcount to N (not N+1)
  4. Spawn N-1 children
  5. Set a data member (e.g. boolean flag) that designates this task as a continuation (so that when it is invoked next time you could execute another branch of code)
  6. Return N-th child task to bypass its spawning (it's another recommended and widely used optimization).
There is also a similar pattern based on allocate_continuation() and recycle_as_child_of() methods. In particular it is used by tbb::parallel_for and tbb::parallel_reduce algorithms. See for example start_for::execute() method in tbb/parallel_for.h header.

Note also that N should be a small number, 2 or 3. If you need to allocate and spawn more children, do it in a biinary or ternary tree like manner instead (see tbb::parallel_invoke implementation for example), otherwise your application scalability will suffer.
0 Kudos
RafSchietekat
Valued Contributor III
367 Views
How efficient is spawn_and_wait_for_all() compared to bypassing the scheduler in step 6 and using wait_for_all(): close to the latter, halfway, ...?
0 Kudos
Andrey_Marochko
New Contributor III
367 Views
They both avoid the cost of spawn, but any wait_for_all() executes additional prologue/epilogue code that tends to grow with more and more functionality added to the scheduler.

But the main problem with wait_for_all() in its current incarnation is that it traps the current call stack, which is fraught with a number of potentially dire consequences:
1) call stack consumption well beyound that of a serial equivalent (because it opens up a way for recursion)
2) inability to redistribute workers between different masters
3) [rare in practice] deadlock if user application calls wait_for_all() under a locked mutex


0 Kudos
oneminute
Beginner
367 Views
Thanks for the replies. Being new to parallel design, I'll just explain my overall design and maybe there is a better way to do it than the way I am doing it where I can avoid the overhead.

I have 2 main task trees, and I need the second tree to run after the first tree (second tree relies on first tree's updated data). To do this I was going to run the first task tree and then when all its child tasks return, run the second task tree - but I also didn't want to have to keep reallocating tasks when they are basically just reused. Is there a better way to do this? Again this is generalized code but it mirrors my own code closer than my first example.

[bash]// main loop
while(true)
{
spawn_root_and_wait(taskTree1); spawn_root_and_wait(taskTree2); }

// Task tree 1 { MyTask& myTask = *new(task::allocate_child()) MyTask(); MyTask& myTask2 = *new(task::allocate_child()) MyTask(); MyTask& myTask3 = *new(task::allocate_child()) MyTask(); MyTask& myTask4 = *new(task::allocate_child()) MyTask(); set_ref_count(5); spawn(myTask); spawn(myTask2); spawn(myTask3); spawn_and_wait_for_all(myTask4); return NULL; } // Task tree 2 { MyTask2& myTask = *new(task::allocate_child()) MyTask2(); MyTask2& myTask2 = *new(task::allocate_child()) MyTask2(); MyTask2& myTask3 = *new(task::allocate_child()) MyTask2(); MyTask2& myTask4 = *new(task::allocate_child()) MyTask2(); set_ref_count(5); spawn(myTask); spawn(myTask2); spawn(myTask3); spawn_and_wait_for_all(myTask4); return NULL; } [/bash]

0 Kudos
RafSchietekat
Valued Contributor III
367 Views
This does not exactly clear things up, but have you tried our advice above already? Anyway, don't get too focused on recycling tasks based on your experience with built-in memory allocation (TBB's scalable allocator is a lot more efficient): just get your program right, have the tasks do something non-trivial, and recycle opportunistically.
0 Kudos
Reply