- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Was this done intentionally? Or is it just that the code hasn't been updated to use the lower-level primitives?
I guess part of my confusion is that I think of TBB as "stacking blocks" in which TBB provides wrappers for the lowest-level primitives (atomic operations and theads), a layer of synchronization (locks and such), mid-level primitives such as containers and the task scheduler, and then very high-level operations (such as parallel_for). I found that concurrent_queue "skip a layer".
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Optimization is the root of all ugliness. We were going after maximum performance, and so tailored the concurrent_queue implementation to the synchronization primitives specific to each platform. In particular, we wanted:
- No locking when no blocking had to be done, and keep atomic operations at a minimum, because they are slow.
- Get the thread off the OS's run queue when blocking has to be done. The TBB 2.0 concurrent_queue did busy waiting sometimes, which was a problem for some users.
Credit goes to our developer Wooyoung Kim for coming up with an implementation that met the above desired properties. It is complex. Indeed, as for some of our other complex algorithms, Spinwas used for checking during development.
It would be nice to have a layerin TBB for doing this sort of thing (i.e. properties 1 and 2 above) so we do not have to write so much platform-specific code. My preference, if we can figure out how to support itacross all platforms, is to provide futex-like functionality in TBB.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seems that most of the OS-specific code is for the signal/wait conditional variables and such (to avoid the busy waiting you mentioned). For doing task-based parallel code, it seems that using such conditional variables would be that helpful. Is that the case? Or were these users doing something other than task-based parallelism?
Which I brings up another question? Do all the worker threads continue to busy-wait when no parallel tasks are running (say, during a sequential region of code)? It seems like that would be a bigger worry.
P.S. I'm glad to see that Spin was used for checking the queue. I've actually done some work on model checking lock-free data structures myself.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, condition variables are not good for task-based parallelism, which is why they are currently missing from TBB. The TBB containers, however, are intended to work for both task-based and plain-old-threading parallelism.
The class concurrent_queue class is particulary problematic in task-based parallelism for several reasons:
- blocking - consumers have to wait for producers, hence the underlying thread may be tied up doing nothing for a while.
- parallelism is always optional, not mandatory in task-based parallelism. So the programmer has to be careful to avoid depending on parallelism for progress (e.g., the producer of an item must run before the consumer of an item!)
- [joke] A queue is a data structure that lets your data get optimally cold (evicted fromcache) before touching it again.
Users not using task-based parallelism find concurrent_queue useful.Likewise for the mutex and atomic operations.
We would have loved to have your checker a few years ago when we were debugging our queuing_rw_mutex on Itanium and optimizing our useof fences.
Our handling of idle worker threads has been evolving with various releases. Here's a summary of the evolution:
- [original unreleased prototype of TBB] Just spin.
- [TBB 1.0] Spin if any worker thread is busy. Sleep if in serial section.
- [current] Sleep if work cannot be found. Wake up if any task is spawned.
What caused us to switch from (2) to (3) is programs that lacked parallel slack; e.g. had 2-way parallelism on an 8-core machine. (3) is a little hyperactive in that spawning a single task wakes up all workers, but seems to work well in practice because assuming there is parallel slack, once the number of spawned tasks transitions from 0 to 1, more tasks come quickly to fully subscribe the machine. If there is not parallel slack, the extra workers soon go back to sleep. The reason we do not do more precise sleeping/waking is that we have not found a scalable algorithm for doing such in a task-stealing environment. E.g., keeping a centralized counter of the number of ready tasks would be a bad bottleneck on stock hardware.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
However, I'm not sure I agree with the implication that explicit locks and task-based parallelism don't mix. Sure, sometimes tasks are strictly data-parallel and work only on private or their partition of data.
Yet, I'm seeing the use of some parallelism in which the tasks access read/write data structures. One example is a dynamic programming sort of problem in which expensive computations are cached in a shared data structure to prevent redundant computation.
In such a context, the mixing of locks and tasks is fine. The blocking nature of locks are unlikely to be a problem, as any task won't hold an lock for very long.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Shortly-held locks are certain okay for task-based programming. That's what we designed the TBB mutexes for.
Condition variables seem to be inherently a problem in task-based parallelism, particularly if the parallelism is optional, not mandatory. If a task is waiting on a condition to occur, and there is no parallelism, then there is no task to set the condition.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page