Optimization is the root of all ugliness. We were going after maximum performance, and so tailored the concurrent_queue implementation to the synchronization primitives specific to each platform. In particular, we wanted:
Credit goes to our developer Wooyoung Kim for coming up with an implementation that met the above desired properties. It is complex. Indeed, as for some of our other complex algorithms, Spinwas used for checking during development.
It would be nice to have a layerin TBB for doing this sort of thing (i.e. properties 1 and 2 above) so we do not have to write so much platform-specific code. My preference, if we can figure out how to support itacross all platforms, is to provide futex-like functionality in TBB.
Yes, condition variables are not good for task-based parallelism, which is why they are currently missing from TBB. The TBB containers, however, are intended to work for both task-based and plain-old-threading parallelism.
The class concurrent_queue class is particulary problematic in task-based parallelism for several reasons:
Users not using task-based parallelism find concurrent_queue useful.Likewise for the mutex and atomic operations.
We would have loved to have your checker a few years ago when we were debugging our queuing_rw_mutex on Itanium and optimizing our useof fences.
Our handling of idle worker threads has been evolving with various releases. Here's a summary of the evolution:
What caused us to switch from (2) to (3) is programs that lacked parallel slack; e.g. had 2-way parallelism on an 8-core machine. (3) is a little hyperactive in that spawning a single task wakes up all workers, but seems to work well in practice because assuming there is parallel slack, once the number of spawned tasks transitions from 0 to 1, more tasks come quickly to fully subscribe the machine. If there is not parallel slack, the extra workers soon go back to sleep. The reason we do not do more precise sleeping/waking is that we have not found a scalable algorithm for doing such in a task-stealing environment. E.g., keeping a centralized counter of the number of ready tasks would be a bad bottleneck on stock hardware.
Shortly-held locks are certain okay for task-based programming. That's what we designed the TBB mutexes for.
Condition variables seem to be inherently a problem in task-based parallelism, particularly if the parallelism is optional, not mandatory. If a task is waiting on a condition to occur, and there is no parallelism, then there is no task to set the condition.