- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a process running 100% on 24 cores! Dumping the process reveals 23 TBB threads are spinlocking within area_in_need!
arena* arena_in_need () { spin_mutex::scoped_lock lock(my_arenas_list_mutex); return arena_in_need(my_arenas, my_next_arena); }
Typical callstack:
1 tbb.dll!__TBB_machine_cmpswp1()2 tbb.dll!tbb::internal::market::arena_in_need()3 tbb.dll!tbb::internal::market::process(rml::job & j={...})4 tbb.dll!tbb::internal::rml::private_worker::run()5 tbb.dll!tbb::internal::rml::private_worker::thread_routine(void * arg=0x000000001f1be7d0)6 tbb.dll!_callthreadstartex()7 tbb.dll!_threadstartex(void * ptd=0x0000000000000000)
No other thread is running tbb code or tasks..
It might be useful to specify we use task priorities through the optional context parameter.
any ides as to why it does this and suggestions on how to fix it?
NOTE: we are using TBB 4.0 U2 (OSS278 of last december)
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, it can be related to task priorities..
Could you provide us a reproducer or sketch the structure of your code here?
Do you use task:enqueue (with priorities)?
When you say 23 threads are busy doing arena_in_need(), what 24th thread is doing? Is it a master thread? How it submitted the work to TBB?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried a few hours to create a test case to reproduce this problem without success..
We are running a mix of parallel_for_each, parallel_for and parallel_sort and pipelines with either high priority set on the task_group_context or in normal mode with default context.
As for the 24th thread.. I might be mistaken but does TBB creates only P-1 threads where P is the number of logical processors? As I mentionned in my original post, there is no code running tbb algorithms at the time of the coredump.
Looking at the code... could it be that the list of arenas get very long and iterating through them in area_in_need could take longer than expected while holding the spin mutex? I currently only have a minidump of the process that had the problem so I can't see if it's the case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is it possible that too many thread fighting for that mutex would make them spinlock uselessly?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a full user dump of a process showing the issue..
I could certainly execute some commands in windbg and send you the result if it could help diagnose the issue.
Many thanks
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page