Hi, we're using TBB in a fairly simple setup - just parallel_xxx stuff (no complicated heterogeneous graph-based calculations).
So basically we just need TBB to create exact number of threads equal to the number of cores, and bind them to specific cores (via cpu affinity).
In order to do this, at the application startup, I ran a parallel_for on a fake array, waiting for TBB to create as many threads as we need, and set CPU affinity in the observer (so when it comes to actual work all the threads are already set up and it doesn't spend any time creating threads during actual work).
However this method stopped working when we upgraded from 4.1 to 4.3 - TBB didn't create more than 4 threads during this initial warm-up parallel run. (Any pointer how to solve this and force TBB to create the requested number of worker threads will be much appreciated.)
But that wasn't the biggest problem. The biggest problem happened when TBB started creating more worker threads than requested in tbb_init (and even more than the number of cores we actually had). That was the disaster as with such setup (I understand the excuse for this is that TBB will not _run_ at the same time more threads than requested so I kinda shouldn't worry about too many threads created) binding threads to cores doesn't make any sense anymore as there is no way to know which threads will actually run on specific parallel_xxx call. There is just no meaningful way to bind threads to cores and guarantee equal core load - even if I bind them cyclically I may end up with some core running several threads and other cores running none.
So, the question is - how to force TBB to start exactly N number of worker threads bound to specific cores (by observer) and no more? Better on startup, not on actual workload.
P.S. Currently I've rolled back from 4.3 to 4.1.