Link Copied
I think the world is heading towards this but isn't quite there yet. I've seen academic research and heard of plans from Microsoft to make the HW thread pool a system-allocated resource. In such an environment load balance from the available HW threads could be shared across processes as well as through hierarchies using different threaded libraries (e.g., sharing a common set of threads between TBB and OpenMP) within a process. But it's not quite there yet. For the present it remains a resource issue that is left to the applications to solve.
I think the world is heading towards this but isn't quite there yet. I've seen academic research and heard of plans from Microsoft to make the HW thread pool a system-allocated resource. In such an environment load balance from the available HW threads could be shared across processes as well as through hierarchies using different threaded libraries (e.g., sharing a common set of threads between TBB and OpenMP) within a process. But it's not quite there yet. For the present it remains a resource issue that is left to the applications to solve.
"Throttle-down" is the issue at the moment. Creation of the TBB thread pool is still pretty "static" in that you can change the size of the thread pool but only by throwing way the current one and creating a new one. We've actually provided sample code using a dynamic task_scheduler_init object so that it could be easily thrown away and recreated to give an application the ability to set thread pool size. As currently realized, however, it would be up to the application (or cooperating applications) to recognize the oversubscription and back off on their resource use.
But evolution to a common thread pool has already started, beginning at the leaves, of course: Intel provides an OpenMP runtime library, libiomp5md.dll, that uses an API shared with implementations of OpenMP bytheMicrosoft and GNU compilers, meaning that whatever mix of compilers you use with OpenMP, their runtime needs can all be handled by a single library. The challenges to come are much larger than this of course, but they are recognized as the way forward.
I think Jim's code still needs a little work to work, but if you can detect and autoswitch between TBB and non-TBB versions of the code, then you could switch between TBB versions of the code that create smaller thread pools. As Anton and I suggested, applications can control the size of their TBB thread pools by providinga numerical argument to the first task_scheduler_init object creation that they do. I could conceive of a behavior where the application checks on startup for any other copies running and then chooses either the default number of HW threads (maybe 8) or some smaller number (1 or 2).
Ignoring all the races between processes that ultimately such a scheme may have to deal with there's still a basic chicken-or-egg problem here: I start up the first copy--no competition--8 threads; I start a second and it takes one or two and so on, but my machine is still oversubscribed until the first thread finishes. This calls for a more dynamic method, where the original copy can detect the arrival of subsequent copies ofthe applicationand somehow dial back its thread pool by some means like Jim was trying to achieve with his code sample. TBB's non-premptive scheduler means each worker is (relatively) autonomous, so there might be some wait involved as the terminating thread pool finishes up its assigned tasks. I think some workable scheme might be derived from this although I think you could spend a lot of time getting the heuristics right for how big the replacement thread pool should be.
For more complete information about compiler optimizations, see our Optimization Notice.