- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
TBB tasks and algorithms provide a nice way to achieve this for most jobs but AFAIK your work isn't guaranteed to get done until you wait for it much like futures. This makes a lot of sense to me but messages handling can be heavyweight and its completion doesn't cause any signals so I really don't care when it completes as long as it completes sometime and therefor I don't want to wait for it to do so when I publish it.
Since message handlers can publish other messages I can't safely run messages at the publish call in a single threaded program because this recursive publishing could cause the stack to overflow.
I have a main loop where I could wait for pending messages to complete (process to a high water mark?) publishing but this might create an unnecessary synchronization bottleneck for the multi-thread multi/many-core case. However, this is the best idea I can come up with.
I haven't worked much with multi-threaded or asynchronous operations before so I would like to get some feedback and perhaps suggestions for alternative designs.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
MichaelMarcin:
TBB tasks and algorithms provide a nice way to achieve this for most jobs but AFAIK your work isn't guaranteed to get done until you wait for it much like futures. This makes a lot of sense to me but messages handling can be heavyweight and its completion doesn't cause any signals so I really don't care when it completes as long as it completes sometime and therefor I don't want to wait for it to do so when I publish it.
This is not true. TBB tasks, since spawned, start executing in best effort manner, no matter waiting you for them to complete or not.
But here is another caveat. Execution order of tasks is not guaranteed. So for example, you spawn task which recursively and infinitely respawns itself. Then one day this task also spawns second task which must stop whole processing. Second task is not guaranteed to run at all. So you will end up with a kind of deadlock.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
randomizer:This is not true. TBB tasks, since spawned, start executing in best effort manner, no matter waiting you for them to complete or not.
I realize this for the multi-threaded case it will probably start executing immediately but what about the single-threaded case? If TBB spawns no worker threads does this mean that the task executes at spawn or at wait? I don't see any other options.
randomizer:But here is another caveat. Execution order of tasks is not guaranteed. So for example, you spawn task which recursively and infinitely respawns itself. Then one day this task also spawns second task which must stop whole processing. Second task is not guaranteed to run at all. So you will end up with a kind of deadlock.
So task scheduling is not fair. I can understand that and probably prefer it for most things but I think I can get around that deadlock by having 2 empty tasks for synchronization that I use to double buffer all message handler tasks which would be their children. The active empty task would wait until its children complete and any new messages would create tasks to the back buffer empty task. After the active task completes its wait it would atomically swap itself with the back buffered empty task which could then wait for its children to complete. This gives me the high water mark and I think is somewhat fair.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
MichaelMarcin:
I realize this for the multi-threaded case it will probably start executing immediately but what about the single-threaded case? If TBB spawns no worker threads does this mean that the task executes at spawn or at wait? I don't see any other options.
They will be executed when current task will finish it's execution and give control back to TBB scheduler.
MichaelMarcin:
So task scheduling is not fair. I can understand that and probably prefer it for most things but I think I can get around that deadlock by having 2 empty tasks for synchronization that I use to double buffer all message handler tasks which would be their children. The active empty task would wait until its children complete and any new messages would create tasks to the back buffer empty task. After the active task completes its wait it would atomically swap itself with the back buffered empty task which could then wait for its children to complete. This gives me the high water mark and I think is somewhat fair.
Hmmm... I don't completely get your point. Can this scheme load more than one processor?
Anyway you must be very careful implementing your own message queueing and scheduling. TBB has very efficient low-overhead distributed scheduler. If you will be implementing your own scheduler on top of TBB's scheduler, it will be very easy to destroy performance and scalability of TBB's scheduler.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The scheme is designed to allow a message A to invoke a handler which sends message A to not infinitely block a single-threaded program.
Since I'm spawning the child tasks immediately I expect that TBB can run the children of a empty synchronization task concurrently. I would also expect that TBB would run the children of both topmost tasks concurrently. However I wouldn't expect it to guarantee that the children finish running until I call wait on the parent and it returns.
I would rather not have this complexity at all. In fact I would like to just spawn a task and know that it will finish. I can't see how I could do that in a single-threaded program however without some sort of call to essentially run all the message handling tasks I have created. And I can't see how that call can not block indefinitely on a recursive message handler unless I limit the work it can do in one call.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When I publish a message I find the range of subscribers interested in that message and invoke the message handlers on each of them. The caveat is that I don't want to wait for those message handlers to complete, at least not there, and especially not in that call stack.
So I execute the handlers in a task very similar to a non-blocking parallel_for and give it a parent task that I can use to wait on later. When I do this wait later on I create a new parent task which I swap with the current parent task. This new task will be the parent of future non-blocking parallel_for message handling tasks. Then I wait on the old parent task and destroy it when it finishes without ever spawning it (as inspired by the Advanced Task Programming chapter's long running task example in the TBB book).
This is designed to prevent starvation and allow correct execution in a single-threaded environment while minimizing blocking. It also should prevent infinite recursion from handlers that publish messages which spawn the same handlers and can lead to stack overflow. It should also prevent the wait from being too greedy and preventing the main thread from doing work other than processing messages if there are always messages ready to be processed (as the result of publishing messages from handlers).
Does this explanation/proposed implementation make sense?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
MichaelMarcin:If I just spawn the tasks and don't wait for them in a single-threaded program then they will never run right? I suppose I'm trying to prevent starvation of tasks.
Oh, I see. I was thinking that by 'single threaded environment' you mean situation when you start TBB's worker threads, but there will be only 1 worker thread on single-core machine. It seems that you are talking about situation when you don't start TBB's worker threads at all, only your main thread, right?
Maybe the solution is to start TBB's worker threads? Then all spawned tasks will be executed automatically. I think it can simplify design of your application, because then you don't need to handle 2 different cases. Just single case: your main thread + N worker threads.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Michael,
I thought more of your design, and I think there is some caveat I should warn you of. The TBB scheduler is not FIFO with regard to tasks, which might matter for message processing. In particular, a running task scheduler(including the one starting in wait_for_all) treats its own task pool as LIFO and starts processing most recent tasks first. It might be fine, but I think you'd better be aware of that. Also, a task scheduler is oblivious to parent tasks while draining the pool, which means that the scheduler, once started, won't exit until all tasks in its local pool complete; so if you use two WorkManager instances in the same thread in your application, then call WaitForWork for one of those, in fact all tasks initiated in the second manager will also complete except those that were stolen.
One particularly bad consequence for your design with two swapping roots, I think, is that after roots were swapped, new tasks that use the new root will be executed mainly before tasks that use the previous root, which completion you are waiting for. So waiting could take longer than expected. If this might become a problem, I suggest you use some flag to postpone spawning new tasks after swapping, till the moment wait_for_all finishes. Tasks being added by AddWork can be kept in some shared dynamic data structure (for TBB 2.0 & 2.1, a task_list can be used to both keep the tasks and then spawn them all at once, but in future versions spawning a task_list might become obsolete or inefficient due to possible changes in thetask pool representation).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is indeed very distressing. Thanks for bringing this up.
Waiting to spawn the other tasks would be unfortunate because there might be work that could be accomplished by idle workers while the thread waits for the last few work items to complete.
I may have to think about this problem some more. It seems TBB tasks might not be the correct solution.
P.S.
In the process of implementing the task to recursively do the work I wanted to support auto_partitioner to split the work up. It appears however that auto_partitioner cannot be supported by user made tasks because it only exposes its data via friendship with a select few internal task classes. This seems unfortunate.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
MichaelMarcin:
Ok so this means that if I have a task which always results in spawning another task and it doesn't happen to be stolen by another pool then wait_for_all will never return?
Indeed!
http://softwarecommunity.intel.com/isn/Community/en-US/forums/permalink/30263843/30263253/ShowThread.aspx#30263253
Even worse. If your task recursively spawns itself, then there is no guarantee not only than wait_for_all will return, but also than other tasks will be eventually executed.
As for now TBB suites well only for HPC, i.e. order of execution doesn't matter, no priorities, all tasks must be executed. This is the reason why .NET TPL Team considers changing strict lifo scheduler to something other, it's just too non-intuitive for user in some situations.
http://blogs.msdn.com/pfxteam/archive/2008/08/01/8800195.aspx
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
TBB uses an approximation of Cilk-style scheduling (work LIFO; steal FIFO).It has good locality and load balancing properties while making certain space guarantees. It enables clean nesting of software components, analogous to the way serial code composes by nested subroutine calls.
FIFO has terrible locality and often has no space guarantees, which often leads to complex throttling mechanisms. Nonetheless FIFO scheduling definitely has its uses, notably for soft real-time. Finding a rationale marriage ofCilk-style and FIFO scheduling remains an open problem. If done rigorously it's a worthyfor aPh.D. thesis.
The "two roots" idea suggests that perhaps said marriage could be accomplished by some kind of epoch mechanism, where each task belongs to an epoch and older epochs are given precedence. There's a long way between a notion and a practical system.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
(Added) Well, no local hidden queue (substitute participation in merge tree), only stealing should be delayed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Raf_Schietekat:"If done rigorously it's a worthyfor aPh.D. thesis." Shortest thesis ever: keep successor tasks in local queue, append to global FIFO structure at task steal time, only steal from it if no normal tasks are left?
(Added) Well, no local hidden queue (substitute participation in merge tree), only stealing should be delayed.
Do I get your idea correctly?
__declspec(thread) work_stealing_deque private_deque;
fifo_queue global_queue;
void spawn_task(task* t)
{
global_queue.push(t);
}
task* get_next_task_to_process()
{
task* t;
if (t = private_deque.pop())
return t;
if (t = steal_from_other_private_deques())
return t;
if (t = global_queue.pop())
return t;
return 0;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What would the PhD title be for? Interprocess scheduling? Cooperation with the kernel? Realtime guarantees? A variable number of worker threads related to blocking behaviour? Condition variables? Or did I overlook something?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page