feature suggestion: wait for ALL tasks, custom task wait

Alexander_Herz · ‎01-03-2012

I'd like to suggest two features:

1. There appears to be no way to wait for all currently scheduled tasks (spawned and enqueued). My task based application is supposed to quit as soon as (but not before) there are no more tasks available for execution. Currently I attach all tasks to one uber-root task and wait for this task to finish. This works up to the point where you want to use enqueue.

2. The only valid way to put a task to sleep and wait for a task-external event (e.g. a child task finishes) is to use some form of wait_for_all() which (presumably, the docu is sparse here) frees the current worker thread
to be used by other tasks until the waiting task continues in order to guarantee progress (otherwise, all worker threads might be blocked by tasks waiting on executable tasks which are not executed because all workers are blocked). Due to this very reason, it is not possible for a task to wait for any other external event using a custom wait function because the worker thread would be blocked (e.g. consumer-producer).
In the current tbb impl I would need to create my own worker thread for blocking tasks. To do this efficiently, I would create a thread pool etc, basically recreating lots of the tbb infra-structure.
The versatility of tbb tasks would be greatly enhanced if it was possible to use a mechanism similar to what wait_for_all() is doing when doing a custom wait.

If there is a decent chance for these changes to be accepted into the tbb codebase I would implement them myself. If not, I'll have to stick to my custom task scheduling system which is less portable and tested than tbb but doesn't impose these restrictions. Actually, it's a bit faster but portability would be worth it.

Regards,
Alexander Herz

jimdempseyatthecove · ‎01-03-2012

In TBB you could add code to count your enqueue's and un-count your dequeues (inc before enqueue, dec following dequeue). Then include the count in your test for done.

When using QuickThread (www.quickthreadprogramming.com), this is a built in feature (wait for task set to complete).

I suspect the problem you have is you are instantiating a monitoring thread using a TBB thread, and that thread is waiting for all the other TBB threads to reach completion. Note, this can include your main thread which is a member of the TBB thread pool. Consider spawning a seperate monitor thread. Or as a quick experiment

int main()
{
task_scheduler_init init;
parallel_invoke( yourMonitorThread, yourProcessingRoot);
}

Jim Dempsey

Alexander_Herz · ‎01-03-2012

Thx for the suggestion.

Currently, I do not have a monitoring thread. The main thread simply starts the root task and waits for it to finish. Instead of enqueue I add the task to be "enqueued" as child of the root task so that the root task does not finish before the "enqueued" tasks are processed. That works in principle but I doubt this is intended usage and I'm not sure if I get fairness problems for my "enqueued" tasks when doing this.

My second suggestion for custom waiting inside tasks is unrelated to the first suggestion. Here I want to wait for some "external" signal inside a tbb task whereas my first suggestion deals with waiting for all tasks in the tbb system from outside any tasks.

Alex

jimdempseyatthecove · ‎01-04-2012

In QuickThread we resolved your two wait for all tasks in two different but similar ways:

void SomeTaskEntry(arg_t arg, arg2_t arg2)

{

code spawning additional tasks

}

.

// some other task

parallel_task(SomeTaskEntry, arg1, arg2);

additional code

parallel_wait();

--------- or --------

qtControl otherTaskControl; // static or scoped newd to pointer

// some other task

parallel_task(&otherTaskControl, SomeTaskEntry, arg1, arg2);

additional code

otherTaskControl.WaitTillDone ();

And where the SomeTaskEntry task receives an implicit and stacked default qtControl object with implicit WaitTillDone() on exit from the task. The choice of using implicit waits, explicit waits, or no waits at all, is under the control of the programmer.

For the task that you wish to spawn that are intended to wait for an external event (or I/O completion), QuickThread has a second class of thread, an I/O class. These are two distinct thread pools. To disambiguate the classes, you can include a routing indicator.

parallel_task(IO$, yourIoTaskFunctor[, optional args here]);

or

parallel_task(IO$, &yourIoControl, yourIoTaskFunctor[, optional args here]);

In the first case, your application is not interested in when the task completes (or you use a separate flag).

In the second case, you can explicitly use yourIoControl.WaitTillDone();

Jim Dempsey

RafSchietekat · ‎01-05-2012

#1 "In TBB you could add code to count your enqueue's and un-count your dequeues (inc before enqueue, dec following dequeue). Then include the count in your test for done."
Maybe that needs to be elaborated a bit. Waiting for child tasks to complete is based purely on reference counts, and these reference counts can be manipulated directly. It seems doable to pretend the enqueued tasks to be extra children for some root task by manipulating its reference count (I have not tested this myself), if this is what you want to do, and as long as you accept that this involves a lot of sharing (which is costly). Make sure that adding references only occurs from within a dependency of the root task or with other assurance that it won't launch prematurely, otherwise accidents may happen.

#2 "That works in principle but I doubt this is intended usage and I'm not sure if I get fairness problems for my "enqueued" tasks when doing this."
Only enqueued tasks get fair scheduling.

#2 "My second suggestion for custom waiting inside tasks is unrelated to the first suggestion. Here I want to wait for some "external" signal inside a tbb task whereas my first suggestion deals with waiting for all tasks in the tbb system from outside any tasks."
Be careful not to wait inside a TBB worker thread or undersubscriptiion may result.

(Added 2012-01-20) For #1, apparently enqueued tasks can be another task's real children, no need to emulate anything.

Alexander_Herz · ‎01-19-2012

#2 "My second suggestion for custom waiting inside tasks is unrelated to the first suggestion. Here I want to wait for some "external" signal inside a tbb task whereas my first suggestion deals with waiting for all tasks in the tbb system from outside any tasks."
Be careful not to wait inside a TBB worker thread or undersubscriptiion may result.

Waiting inside a TBB task may actually cause starvation, which is a serious problem. That's why I suggest to enhance the TBB task scheduler such that waiting inside a task without causing undersubscription/starvation becomes possible.

Alex

RafSchietekat · ‎01-19-2012

Undersubscription, starvation, deadlock: bad, worse, worst?

(I believe I have already mentioned a workaround somewhere.)

RafSchietekat · ‎01-19-2012

That workaround would be to merge those external dependencies into the task dependency system in the form of reference counts, explicitly manipulated from non-TBB threads.

An alternative implementation would be to mark tasks in advance for execution on their own private thread, but that only scales to a limited number of external dependencies, so for that reason alone it could never be a panacea (it wouldn't serve thousands of sockets).

Note that both approaches (the current workaround and the potential enhancement) will always remain vulnerable to starvation of tasks buried on the stack underneath a task which is blocked waiting for children even if none of them are currently executing but one of the descendants is an external dependency. Even if you isolate such occurrences to an arena (work descending from its own user thread) where this isn't an issue, it could pin down a worker thread and make it unavailable for use elsewhere, and so still lead to undersubscription. My recommendation is to try and isolate such code to its own arena (from a separate user thread than computation-oriented work with general algorithms), and to only use continuations in such an arena (no built-in algorithms). This is a lttle stricter than the real requirement (only continuations between user thread and external dependency), but probably also a little simpler (more chance to get it right and maintain it) and robuster (more isolated impact of mistakes).

Feel free to point out any mistakes/flaws!

Alexey-Kukanov · ‎01-20-2012

Hi Alexander,

allocatiing all tasks (including those that are later enqueued) as childs of a common root is fine and should work without problems. If there are problems, let us know and we will consider it a bug to be fixed.

Re the second suggestion: your understanding of wait_for_all is correct, it will take and execute other tasks, and with explicit ref_count manipulation it can be used to "emulate" waiting for an external event. If the event requires a blocking wait however, you still need a thread to execute that wait; it can be a TBB worker thread or an external thread. We consider adding support for blocking waits into TBB tasks; the idea being pondered is to use some markup API that designates a potentially long-waiting code section, so that another thread might go in service replacing the blocked one. I think we considered the idea of having a special task (or a special method to "spawn" a task that blocks) and found it to be more cumbersome to use; but we might reconsider.

There certainly is a decent chance that the changes you talk about, especially the second part, are accepted into TBB; though since the scheduler is concerned, we will review both a general idea and an implementation very thoroughly. If you are interested, I suggest we continue the discussion here at the forum.

RafSchietekat · ‎01-20-2012

"We consider adding support for blocking waits into TBB tasks; the idea being pondered is to use some markup API that designates a potentially long-waiting code section, so that another thread might go in service replacing the blocked one. I think we considered the idea of having a special task (or a special method to "spawn" a task that blocks) and found it to be more cumbersome to use; but we might reconsider."
While the former avoids undersubscription, it can still lead to starvation of anything currently trapped on the invoking thread's stack. If the latter is implemented by executing such a task on its own thread, you can buy less chance of starvation at the cost of more threads used (unscalably so). It's not just about API, I think, although I would be glad to be mistaken about that.

(Added 2012-01-23) Those new threads could of course have a far smaller stack.

Alexander_Herz · ‎01-23-2012

I think it is clear that blocking waits must be performed in an extra thread to guarantee that at any time >0 worker threads are available to process pending tasks and nothing is trapped on the blocked thread's stack.

If a fixed upper bound for the number of simultaneously blocking tasks can be given, then sufficiently many additional block-worker-threads can be allocated. If no such bound is available, then these threads would need to be allocated on the fly (or there's a pool caching them). In the later case, scalibility can be impaired, if the mechanism is abused (e.g. contineously growing amount of simultaneously blocking tasks). Obviously, blocking tasks should avoid to perform much computation and do not much else than waiting for the block to end to avoid oversubscription.

If the ratio of blocking to non-blocking tasks is low and doesn't grow beyond a reasonable constant (which may not be known statically) then I don't see any problems.
So like pretty much any feature, if it is abused, it will yield poor performance. If it is used reasonably, it would be a useful enhancement. Currently, you are basically locked ot of using tbb tasks as soon as you want to block for just a single task (unless you implement your own scheduler for these tasks..).

Alex

RafSchietekat · ‎01-23-2012

"Currently, you are basically locked ot of using tbb tasks as soon as you want to block for just a single task (unless you implement your own scheduler for these tasks..)."
I've already described how to do it within TBB, but some more sugar would be nice, like a task-reference interface to a facility like epoll. Maybe with both spawn-for-blocking and emergency-blocking-guard added as well, experience will reveal best practice in the stacked world of C++.

Alexander_Herz · ‎01-26-2012

I never doubted that a workaround can be done, it would just be a lot nicer to have an official api rather than everyone fixing the problem themselves again. The sugar/api should be discussed.

As a related question,

I create my own worker threads to execute blocking tasks. Blocking tasks are created as childs of some standart tbb::task so that after running execute() of the blocking task on my own thread I destroy the task (to clean up and to notify the waiting parent that the block is over).
When trying to destroy the task I get a failing assertation that my parent's state is "executing" and not "allocated". When/how can I destroy a child task that I'm executing myself on my own thread?

in tbb task:

set_ref_count(2);
task=new(allocate_child()) T();
spawn_on_own_thread(task);
wait_for_all();

my thread:

task->execute();
task->destroy(*task); // assertation parent->state()==allocated failed

Thx,
Alex

RafSchietekat · ‎01-26-2012

Instead of destroy(), try delete on the child and decrement_ref_count() on the parent.

Alexander_Herz · ‎01-26-2012

you mean task->~Task()? Or is it legal to call
delete task; ?? The task was allocated using alloc_child() and inplace new.

Alex

RafSchietekat · ‎01-26-2012

Try "in-place" delete, also with alloc_child(), that looks like it might work. You don't really have to use a TBB task for the "child": just manipulating the reference count should be enough (I hope).

It's only off-the-cuff advice for a quick hack, though, you may get better from the TBB team or so, and I might have to revisit this part of the code to come up with something more useful.

Alexey-Kukanov · ‎03-11-2012

Quoting Alexander Herz

When trying to destroy the task I get a failing assertation that my parent's state is "executing" and not "allocated". When/how can I destroy a child task that I'm executing myself on my own thread?

in tbb task:

set_ref_count(2);
task=new(allocate_child()) T();
spawn_on_own_thread(task);
wait_for_all();

my thread:

task->execute();
task->destroy(*task); // assertation parent->state()==allocated failed

The assertion dates back to pre-TBB 1.0; at some point, multiple assertions about the task state were added "to catch invalid sharing of running tasks". The requirement enforced by the assertion is not documented. I will consult with Arch about the intent, and see if the assertion is still there for good reasons or it's an anachronism to be removed.

Alexey-Kukanov · ‎03-13-2012

We decided that the assertion will be weakened to allow the described use case, which indeed seems valid and useful. The documentation for task::destroy will be updated to mention requirements to the parent task.

Alexander_Herz · ‎03-27-2012

Hi,

great news. I guess this will be incorporated in the next release?

While we're at it, It would also be nice to be able to increment the refcounter of a task by more than one.
Currently, I have to call increment_ref_count() multiple times (generating a locked add for each), which is not necessarily efficient.

Regards,
Alex

Anton_M_Intel · ‎07-17-2013

The assertion will be relaxed in the next stable release