If you just need a task instance to allocate and spawn children, then tbb::task::self() is here for you. It returns the reference to the task currently being executed, so you can refer to "this" task from parallel_for bodies etc.
It's possible to obtain the root task (or, more exactly, the root continuation task, as the original root task was recycled); for this you should start from self() and go up the parent() tree: self().parent(), self().parent()->parent(), etc.A task whose parent is NULL is the parent task of the root and the root continuation.
In case you are going to use this complex method and not task::self(), could you please explain in more details why it seems better to you? If there are good motivating cases, we might consider adding a method to get the "root" task.
Hi, Nick. Perhaps you can describe further the environment in which your code is operating? When you say the code uses a parallel_for, are you speaking of the conventional use of a splitable range of array values and a recursive splitting of tasks over that range? Or perhaps your code just uses a parallel_for to create a set of independent tasks that you'd like to manage further?
When you say "pushing," are you being literal like "pushing an item onto a queue," or do you mean something more? You describe a condition where "pushing" will require some time, a mutex wouldn't be ideal but spawning/recycling a new task would be best. If you need a mutex or similar synchronization object to write a protected array element, I'm not sure how you avoid the need for a lock justby deferring the computation to another task. Ultimately data need to be written, and some sort of synchronization step might still be needed.
If you need to be able to add tasks to the tree, you might consider managing initial task allocation in your program rather than relying on the parallel_for, and then you could have all the flexibility you desire. parallel_for is targetted at the domain of splitable ranges. Adding hooks to parallel_for for such an esoteric use as you suggest may be stretching the concept too far.
mad reed:parallel_for is targetted at the domain of splitable ranges. Adding hooks to parallel_for for such an esoteric use as you suggest may be stretching the concept too far.
I agree, adding anything like this specifically to parallel_for is probably a no-way, mostly because of interface complications.
The method of going up the task tree I described above is possible to legalize as a general taskinginterface, like task::self(). The semantics of this method would be hard to describe though, as theresult for a particular algorithm would depend on algorithmimplementation; and this is quite a serious concern against it. Thus really good motivating cases are necessary to consider it.
I think the right approach might be new loop template that supports loops whose body is pure continuation-passing style code. If done right, that would enable composing loops in a nested fashion where no loop body in the nest leaves anything on the run-time stack when it invokes the loop inside itself. I've thought about this from time to time, but never worked out the details.
As Alexey points out, task::self() could be used to get to the task running the loop body, task::self().parent() would get you to the task in the task graph to which you could add an additional child by using task::allocate_additional_child_of. Using the parent, and not the root, is probably best for scalability. Hanging all additional children off the root makes all processors contend for the same cache line when they update the reference count.
Be warned that wenever intended parallel_for to be introspected that way, and so we can guarantee that hack will work in the future. But if it is a generally useful pattern, we could look into supporting the pattern, perhaps with a different template.