We're running into an unexpected issue where TBB is spawning more threads than expected which is causing a problem in our case. This is with version 4.1.3, we haven't tried on any of the more recent versions.
Here is a slightly simplified description of what we're observing (the actual details are a bit more complex, but I'm unsure if the additional details are relevant right now, I can expand if needed).
- We're running on a machine with 28 cores, and specifying that number to tbb::task_scheduler_init at the start of the program (this corresponds with what tbb::task_scheduler_init::default_num_threads returns).
- When we run the first parallel_for, we see tbb threads start to spawn. The number stabilizes at 27 additional tbb threads. These together with the main thread make 28 threads in total, which is what we requested. The main thread shares in the parallel_for work loads.
- It's non-deterministic but we've seen it happen sometimes. During some future invocation of parallel_for (after things have been running smoothly for a while), we sometimes observe one of the tbb threads spawn an additional tbb thread, for a total of 28 in the pool. This is unexpected and we crash due to not having the relevant thread local state set up for it in our application.
An example callstack when this happens looks like:
pthread_create, FP=7fff0dbf1ab0 rml::internal::thread_monitor::launch, FP=7fff0dbf1b60 tbb::internal::rml::private_worker::wake_or_launch, FP=7fff0dbf1b60 tbb::internal::rml::private_server::wake_some, FP=7fff0dbf1b60 tbb::internal::arena::advertise_new_work<(tbb::internal::arena::new_work_type)0>, FP=7fff0dbf1e50 tbb::internal::generic_scheduler::local_spawn, FP=7fff0dbf1e50 tbb::interface5::internal::task_base::spawn, FP=7fff0dbf1e90 ...py1<openvdb::v4_0_3::tree::InternalNode<openvdb::v4_0_3::tree::LeafNode<float,3>,4> >,const tbb::auto_partitioner>::offer_work, FP=7fff0dbf1ee0 ...ee::InternalNode<openvdb::v4_0_3::tree::LeafNode<float,3>,4> >,const tbb::auto_partitioner>,tbb::blocked_range<unsigned int> >, FP=7fff0dbf2040 ...yCopy1<openvdb::v4_0_3::tree::InternalNode<openvdb::v4_0_3::tree::LeafNode<float,3>,4> >,const tbb::auto_partitioner>::execute, FP=7fff0dbf2060 tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all, FP=7fff0dbf20f0 tbb::internal::arena::process, FP=7fff0dbf2130 tbb::internal::market::process, FP=7fff0dbf2180 tbb::internal::rml::private_worker::run, FP=7fff0dbf21c0 tbb::internal::rml::private_worker::thread_routine, FP=7fff0dbf21d0 start_thread, FP=7fff0dbf2310 __clone, FP=7fff0dbf2318
Does it sound plausible that TBB would be spawning this extra thread (making the total more than we've requested)?
- Under what circumstances would this happen?
- Is there a way we can prevent it?
As mentioned it doesn't happen consistently, in fact it's rare, but any insights into why it happens would be very helpful.
Actually it looks like it may be an issue with more recent versions. It happens more frequently with version 4.4.6 than 4.1.3. Any thoughts or insights would be much appreciated.
It is a know issue, consider the related question (https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/618706). The short answer is that TBB does not guarantee the exact number of threads. To overcome the issue you can try to use observes (https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/618706#comment-1864976)
That's clarifies everything. I see now that we were making an invalid assumption in our code regarding the life time of threads in the thread pool. Also the observer mechanism should allow us to fix the code in an elegant fashion. Thanks so much for the insights.
No matter what I do, I can't seem to get TBB to call task_scheduler_observer::on_scheduler_exit(). I do see on_scheduler_entry() being called as expected.
If I'm understanding correctly, in the case where TBB is about to switch one of the worker threads to a new one, I should first see a call to on_scheduler_exit() (which will allow me to free up resources associated with the old worker thread), followed by a call to on_scheduler_entry() for the new worker thread. In my testing, I do see on_scheduler_entry() getting called on the new thread, but never see any calls to on_scheduler_exit() beforehand.
I'm wondering if I'm misunderstanding how this mechanism is supposed to work...