We're running into an unexpected issue where TBB is spawning more threads than expected which is causing a problem in our case. This is with version 4.1.3, we haven't tried on any of the more recent versions.
Here is a slightly simplified description of what we're observing (the actual details are a bit more complex, but I'm unsure if the additional details are relevant right now, I can expand if needed).
An example callstack when this happens looks like:
pthread_create, FP=7fff0dbf1ab0 rml::internal::thread_monitor::launch, FP=7fff0dbf1b60 tbb::internal::rml::private_worker::wake_or_launch, FP=7fff0dbf1b60 tbb::internal::rml::private_server::wake_some, FP=7fff0dbf1b60 tbb::internal::arena::advertise_new_work<(tbb::internal::arena::new_work_type)0>, FP=7fff0dbf1e50 tbb::internal::generic_scheduler::local_spawn, FP=7fff0dbf1e50 tbb::interface5::internal::task_base::spawn, FP=7fff0dbf1e90 ...py1<openvdb::v4_0_3::tree::InternalNode<openvdb::v4_0_3::tree::LeafNode<float,3>,4> >,const tbb::auto_partitioner>::offer_work, FP=7fff0dbf1ee0 ...ee::InternalNode<openvdb::v4_0_3::tree::LeafNode<float,3>,4> >,const tbb::auto_partitioner>,tbb::blocked_range<unsigned int> >, FP=7fff0dbf2040 ...yCopy1<openvdb::v4_0_3::tree::InternalNode<openvdb::v4_0_3::tree::LeafNode<float,3>,4> >,const tbb::auto_partitioner>::execute, FP=7fff0dbf2060 tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all, FP=7fff0dbf20f0 tbb::internal::arena::process, FP=7fff0dbf2130 tbb::internal::market::process, FP=7fff0dbf2180 tbb::internal::rml::private_worker::run, FP=7fff0dbf21c0 tbb::internal::rml::private_worker::thread_routine, FP=7fff0dbf21d0 start_thread, FP=7fff0dbf2310 __clone, FP=7fff0dbf2318
Does it sound plausible that TBB would be spawning this extra thread (making the total more than we've requested)?
As mentioned it doesn't happen consistently, in fact it's rare, but any insights into why it happens would be very helpful.
Actually it looks like it may be an issue with more recent versions. It happens more frequently with version 4.4.6 than 4.1.3. Any thoughts or insights would be much appreciated.
It is a know issue, consider the related question (https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/618706). The short answer is that TBB does not guarantee the exact number of threads. To overcome the issue you can try to use observes (https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/618706#comment-1864976)
That's clarifies everything. I see now that we were making an invalid assumption in our code regarding the life time of threads in the thread pool. Also the observer mechanism should allow us to fix the code in an elegant fashion. Thanks so much for the insights.
No matter what I do, I can't seem to get TBB to call task_scheduler_observer::on_scheduler_exit(). I do see on_scheduler_entry() being called as expected.
If I'm understanding correctly, in the case where TBB is about to switch one of the worker threads to a new one, I should first see a call to on_scheduler_exit() (which will allow me to free up resources associated with the old worker thread), followed by a call to on_scheduler_entry() for the new worker thread. In my testing, I do see on_scheduler_entry() getting called on the new thread, but never see any calls to on_scheduler_exit() beforehand.
I'm wondering if I'm misunderstanding how this mechanism is supposed to work...