Some time ago I triedusing the TBB task scheduler torun tasks in parallel with the main application for an extended period of time, typically for the duration of the application. I based this on the Advanced Task Programming example in the TBB book. Unfortunately this didn'twork wellbecauseone processor core was blocked by each such longrunning taskuntil it finished. SoI resorted to using the tbb_thread class instead.
Now I seein the TBB Design Patterns documentation that there's an example called 8. GUI Thread. Herethe tbb task scheduler is used for longrunning tasks.The difference seems to be that the tasks are started using a method called enqueue instead of spawn.
So althoughtbb_threadworks fine I'm a little curious. CanTBB tasks now be used just like tbb_thread for longrunning tasks? Will a TBB task started with enqueue behave likea tbb_thread in principle?
If so, then the new kind of scheduleing won't help you. Tasks that instead of being spawned are enqueued are still executed by workers from TBB thread pool.
The new scheduling mode differs from the classic one by the task retrieval policy - somewhat relaxed First-In-First-Out, and additionally guarantees that such tasks will be eventually executed even if the master thread never enters TBB dispatch loop (one of wait_for_all methods).
What characterizes my threads is thatthey dolongrunning workin parallel with the main GUI thread. The fact that they'reidlingfor events is by necessity, not by choise. I would rather start a TBB task for each work unit butbecauseTBB tasks are unsuitable for longrunningtasks I have to usetbb_threads instead. And the most efficient use of tbb_threads is tohave them running all the time waiting for work to do, thus the idling.
So from what I can see nothing has changed really. TBB tasks shouldn't be used for longrunning tasks because they block processor cores while active.
This means that both the Advanced Task Programming example in the TBB book and now also the GUI Thread design pattern example are misleading. TBB tasks just don't work properly for that kind of processing. Instead tbb_thread (or std::thread in TBB 3.0) should be used. Right?
Do the long running tasks all have to run concurrently, or is it okay to run only a subset at a time?
The new task::enqueue feature has several attributes that make it different from task::spawn:
- Tasks are executed in roughly FIFO order, even if stolen.
- FIFO tasks are only processed by worker threads created by TBB. Thus they will not block threads not created by TBB (e.g. a GUI thread).
- On a uniprocessor, an extra worker thread is created to deal with FIFO tasks.
On a machine with P cores, enqueued tasks are processed by max(1,P-1) worker threads. So until the number of pending enqueued tasks exceeds max(1,P-1), the effect is the same as if you created a tbb_thread for each task. Once the number of pending tasks is larger, the excess ones will have to wait until busy ones complete. So it's suitable for long running tasks as long as they do not depend upon each other. If the long running tasks depend upon each other (particularly in a cyclic fashion so that concurrency is mandatory), then there's no substitute for using real threads.
From your description it looks liketask::enqueue is a great improvement over task::spawn for longrunning tasks and I guess that's why they were introduced right?
One drawback seems to be a possible deadlock situation whentask::enqueues are circularly dependent. That's not too bad.Reference counting smart pointers also have a problem with that. Another dependency drawback seems to be that although task::enqueues don't block other tasks/threads theystill block each other sothey should finish promptly andnever wait idly. Butthat's fine because that's exactly what worker threads should do.
In a tbb_thread all elementsof TBB work without restrictions. Is it the same with task::enqueue (except forthe dependency considerationsmentioned above)? For example will a parallel_for work just like in a tbb_thread?
Finally, the main reason to spawn off a task::enqueue rather than a tbb_thread would be thattask::enqueue has substantially less overhead. Is that so?
A parallel_for will work from an enqueued task, just like it works from a spawned task or a tbb_thread. The only difference between a spawned or enqueue task is how it is picked up for execution. Once it is running, there is no difference.
Yeah, circular dependencies is a red herring. One just better design around them.
But one day one discovers such funny thing as system induced deadlocks. Consider you have N threads, and N tasks that waits for completion of task X. No circular dependencies here. However, if one day it will happen so that a system schedules N tasks simultaneously... oops, task X will never run and the system is completely deadlocked.
If one uses honest OS-level threads, such thing is impossible. But any system that limits number of threads (physical concurrency) is amenable to system induced deadlocks. So one must replace threads with any form of thread pool with caution. That's the reason behind increased number of threads in Win32 thread pool API.
I guess Windows UMS will perfectly handle fair amount of blocking as well as episodic page faults.
Not sure how to enable UMS support, but I see RML_USE_WCRM define that responsible for that.
If the N tasks were looking at a shared variable in a spin (without a means for task pre-emption) then this would be a programming error. Deadlocks are always a programming error. The system is (should not be) responsible for deadlocks.