tbb task problem, max number of spawned children

schleprock · ‎06-17-2008

is there a maximum number of children that a parent task can spawn? i'm running into the following problem.

i'm recursively spawning tasks. the root task appears asserts consistently when it tries to spawn the 0x57 (87) task. the assert i get is:

tbb::assertion_failure (filename=0x2a96099170 "../../src/tbb/task.cpp", line=0x505, expression=0x2a96099e8c "t->state()==task::allocated", comment=0x2a96099e54 "attempt to spawn task that is not in 'allocated' state") at ../../src/tbb/tbb_misc.cpp:65

in smaller testcases i was getting this assert when i did not set the ref count to the number of tasks to spawn + 1 (BTW the plus one is annoyingly easy to forget!). once i fixed this, my stuff works fine until i get to larger testcases. in one particular one i'm trying to spawn 0xa4 (164) children tasks. i set the ref_count to 165 but i get the earlier assert (consistently) when it tries to spawn the 87'th task. so the ref_count is clearly larger then the currently number of tasks that have been spawned.

is there an upper limit to the number of tasks that can be spawned???

bill

ARCH_R_Intel · ‎06-18-2008

There's not supposed to be a limit. The ref_count is a full word, so the system should run out of memory long before ref_count overflows.

Can you post a self-contained example to here or post a bug report to http://www.threadingbuildingblocks.org/bugzilla_search.php?

Thanks,

- Arch

landmann · ‎08-14-2008

Hi,
Just wanted to know, if the above problem has been
solved. I am running into a (perhaps) similar problem,
without knowing why the scheduler asserts with

Assertion t->state()==task::allocated failed on line 1917 of file ../../src/tbb/task.cpp
Detailed description: attempt to spawn task that is not in 'allocated' state

All tasks are definitely spawned by using the spawn( *new ( allocate_child/root) ...)
pattern.

Have there been any fixes to TBB since the report?
If I have to track down this, what would be perhaps the quickest way?

Thank you!

--L

Andrey_Marochko · ‎08-14-2008

Could you please post the code (or its simplified version) that causes the assertion? In the past we've come across several cases when TBB asserted in seemingly innocent programs, and it turned out that an incorrect usage of TBB constructs took place in all those cases

. In particular there's been no real bugs related to this particular assertion during at least the year. So let's have a look at your code.

landmann · ‎08-14-2008

Okay, let my try to shorten all my code and just explain the program flow:

class DCTask : public tbb::task {
unsigned long int blockXY;
public:
unsigned long int childrenSpawned;
DCTask(unsigned long int XY) : blockXY(XY),childrenSpawned(0) {}
~DCTask() { printf("Destroyed %p ",this); }
task* execute();
};

In its execute function it occasionally spawns children. How much is difficult
to predict. So I used the following construct (which probably has bad performance
due to overhead but I cannot think of a better solution in TBB at the moment):

class DummyTask : public tbb::empty_task {
~DummyTask() { printf("Destroyed dummy %p ",this); }
};

tbb::task* DCTask::execute() {

set_ref_count(maxPossibleChildrenAtAll);
do {
...
if(condition_fulfilled()) {
childrenSpawned++;
tbb::task* child = new(allocate_child() ) DCTask(newBlockXY);
printf("Allocated %p ",child);
spawn( *child );
}
}
/* Can we atomically adjust the reference count? This would get rid
of all these dummies. */
int rem = maxPossibleChildrenAtAll - childrenSpawned;
while(rem) {
tbb::task* dummy = new( allocate_child() ) tbb::empty_task();
printf("Allocated dummy %p ",dummy);
spawn( * dummy);
--rem;
}
return 0;
}

The initial call to the first DCTask is done from a second Task (which is spawned as root):

class STask : public tbb::task {
task* execute() {
set_ref_count(1);
spawn( *new( allocate_child() ) DCTask(0) );
....
return 0;
}
};

The initial call is done via:

tbb::task* s = new(tbb::task::allocate_root()) STask();
tbb::task::spawn_root_and_wait(*s);

Using code like that produces the following log (see attachment).
As one can see in the log, the task violating the assert has not been
logged to be allocated before....

Regards!

Andrey_Marochko · ‎08-14-2008

Ok, now it's clear where the problem lurks. Look at the STask::execute() method. What happens there is that after an instance of DCTask is spawned the STask::execute() method returns and the STask object is destroyed (by the TBB scheduler). In the meantime spawned DCTask is executed (either by the current thread or by a worker that has stolen it). After it is executed its parent's refcount is decreased, and if it hits zero, the parent is executed.

But the parent (our initial STask) has already been sent to execution! It is quite probable that it's already dead and you are referencing freed memory, or it can die any moment. Whatever the case you are in trouble. The right way to write this code is:

set_ref_count(2);
spawn( *new( allocate_child() ) DCTask(0) );
...;
wait_for_all();

Note that the ref count is 2, and there is wait_for_all() call.

To the question in your comments. You have to use the task::allocate_additional_child_of() method in order to allocate new children after at least one of them has been spawned.

And the last note. We always recommend to assess the possibility of using of the parallel algorithms provided by TBB before venturing into manual construction of task hierarchies. Your need to create additional children from time to time suggests that parallel_do may suit your purposes well. On the other hand you can use allocate_additional_child_of() from inside of parallel_for and its likes as well by calling it on the result of the static function task::self().

Alexey-Kukanov · ‎08-14-2008

I concur with Andrey that using algorithms is the preferred way. Still someone may need to use tasks directly. There aresome rules to follow when working with TBBtasks:

If an executing task allocated some children, it should wait_for_all those before exiting. This way of task spawning is called "blocking style" because it "blocks" the current task until all children are completed.
An alternative is the "continuation passing" style where you first allocate one task as continuation to the current one:
c = new (allocate_continuation()) my_task_class();
Then you assign all children to the continuation (by allocating on its behalf), set continuation's reference counter appropriately, and spawn children but NOT the continuation.
As Andrey said, the task with children is executed after its ref_count becomes 0. This means you might use continuation tasks to make some reduction after children completed their work. If this is unnecessary, use tbb::empty_task for continuation.
When you can not set the reference counter apriori, use allocate_additional_child_of; it changes the counter atomically. Ensure however that the counter does not become zero before all children are spawned; for that, set it to 1 at the beginning, and,depending on the style in use, either wait_for_all or spawn an additional empty_task at the end.
A task being used to allocate or spawn another task (i.e. the one on the left side of allocate_child and spawn calls) either should be executed by the current thread, or should have been allocated by the current thread. Be aware of that when you want to transfer (references to) task objects using some external storages. If in doubt, allocate/spawn on behalf of task::self().

In the described case, I would use continuation-passing style and allocate_additional_child_of(continuation). Also STask seems unnecessary, unless it does something else in your real code; in the simplified code, it is more efficient to spawn the very first DCTask as the root.

landmann · ‎08-14-2008

Many thanks to both of you A. & A.,

I think I can now fix all of the code with only a couple of additional lines. I was not aware that the scheduler needs so much additional support from outside to correctly build task hierarcies.

And yes, of course STask does a lot more work after spawning the first child. Using a continuation here seems not to be good asthese additional calculations won't execute in parallel to the DCTasks then.But I canspawn both, STask and DCTask, as root and later wait for completion of both of them (instead of STask only), that would be an alternative.

So I will use the blocking scheme first using allocate_additional_child() for the additional tasks which might get spawned.

--L

ARCH_R_Intel · ‎08-14-2008

You might take a look at http://softwareblogs.intel.com/2008/07/02/implementing-task_group-interface-in-tbb/, which has a wrapper that provides a simpler task interface.

The general philosophy behind class task is that it is a high performance engine under the hood of the algorithm templates. As such, the algorithm templates are supposed to be the way to drive it easily, and the task interface is for those who want to take on the burden to buildcustomized high-performance vehicles,

Alexey-Kukanov · ‎08-15-2008

Below are some additional notes in case you would consider continuation-passing later. We have experimentally showed this style being more efficient than the blocking style.

Landmann:
And yes, of course STask does a lot more work after spawning the first child. Using a continuation here seems not to be good asthese additional calculations won't execute in parallel to the DCTasks then.

If you want to spawn some tasks and then continue doing the job of the current task, you still might use the continuation.The trick is basically the same as I described above: you set_ref_count for the continuation to be one more than the actual number of children, allocate and spawn the children, then do the rest of the job, and at the very end use allocate_child to create the last, empty_task child to compensate that initial ref_count bump. For efficiency, do not even spawn the last child but instead return the pointer to it as the result of the execute method. This way you ensure maximal parallelism is used while also avoid some additional overhead of the blocking style.

Landmann:
But I canspawn both, STask and DCTask, as root and later wait for completion of both of them (instead of STask only), that would be an alternative.

Yes; for this,you need to use tbb::task_list where youput both roots to, and then call spawn_root_and_wait(the_task_list).