Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

TBB 4.1: Lockup on closing

P_A__Jimenez
Beginner
1,820 Views

I rencently updated my development environment from Intel C++ Studio XE 2013 Beta to the release version. This new version comes with TBB 4.1, and now the application I've been working on for over a year decides to lockup when closing. Such behavior did not occur with previous versions of TBB, up to the version released along with the Beta (initial release, as I did not update it afterwards while still in Beta).

It may be worth noting that what makes use of TBB is a COM DLL. When DllMain is called with DLL_PROCESS_ATTACH, a static pointer to a task_scheduler_init is initialized. When DllMain is called with DLL_PROCESS_DETACH, the pointer is deleted. The deletion is where things start to go wrong now.

Here's the call stack up to the locking place:

> tbb_debug.dll!tbb::internal::binary_semaphore::V() Line 91 + 0x11 bytes C++
tbb_debug.dll!rml::internal::thread_monitor::notify() Line 186 C++
tbb_debug.dll!tbb::internal::rml::private_worker::start_shutdown() Line 241 + 0xb bytes C++
tbb_debug.dll!tbb::internal::rml::private_server::request_close_connection(bool __formal) Line 186 + 0x11 bytes C++
tbb_debug.dll!tbb::internal::market::release() Line 139 C++
tbb_debug.dll!tbb::internal::market::try_destroy_arena(tbb::internal::market * m, tbb::internal::arena * a, unsigned int aba_epoch, bool master) Line 212 C++
tbb_debug.dll!tbb::internal::arena::on_thread_leaving<1>() Line 343 + 0x13 bytes C++
tbb_debug.dll!tbb::internal::generic_scheduler::cleanup_master() Line 1109 C++
tbb_debug.dll!tbb::internal::governor::terminate_scheduler(tbb::internal::generic_scheduler * s) Line 167 C++
tbb_debug.dll!tbb::task_scheduler_init::terminate() Line 308 + 0x9 bytes C++
dllname.dll!tbb::task_scheduler_init::~task_scheduler_init() Line 111 C++
dllname.dll!tbb::task_scheduler_init::`scalar deleting destructor'() + 0x16 bytes C++

What's also worth noting is that the issue does not seem to occur if no TBB function in particular is called (like parallel_for).

I hope you can shed some light into this serious issue.

Thanks.

0 Kudos
16 Replies
P_A__Jimenez
Beginner
1,820 Views
TBB 4.0 Update 5 does not have this issue.
0 Kudos
Alexandr_K_Intel1
1,820 Views
Could you please report stacks of other thread in time of locking?
0 Kudos
P_A__Jimenez
Beginner
1,820 Views
No threads were available other than the Main Thread, which I find rather strange. I would expect a bunch of working threads to be reported, but they are not there. Same story when debugging with WinDbg. I wonder which change from 4.0 Update 5 to 4.1 may have caused this new behavior. The only extra information I can provide are the topmost calls of the callstack when the process is already locked up (these come from WinDbg): ntdll!NtReleaseKeyedEvent+0x15 ntdll!RtlDeleteTimer+0x2ed I also tried compiling the DLL with TBB 4.1, but using the TBB 4.0 Update 5 DLL (tbb_debug.dll). It is even worse as the application hangs during normal use. It was a long shot, and it had all chances of failing anyway. I also checked if there were any other threads running when releasing the DLL while using TBB 4.0 Update 5, and it's only the Main Thread running (as with 4.1). No lockups at all. Any other things I may check?
0 Kudos
Anton_M_Intel
Employee
1,820 Views
I wonder what is in between NtReleaseKeyedEvent() and binary_semaphore::V(). I bet there is ReleaseSRWLockExclusive(). If so, we have a theory what is changed and happens in TBB 4.1. Please provide version of your OS and the full call stack from WinDbg - we need to know what caused call to NtReleaseKeyedEvent which is known can block in absence of corresponding wait function.
0 Kudos
P_A__Jimenez
Beginner
1,820 Views
Here's the call stack, excluding wrongly solved stack frames from the Delphi executable: WARNING: Stack unwind information not available. Following frames may be wrong. ntdll!NtReleaseKeyedEvent+0x15 ntdll!RtlDeleteTimer+0x2ed tbb_debug!tbb::internal::binary_semaphore::V+0x11 tbb_debug!rml::internal::thread_monitor::notify+0x40 tbb_debug!tbb::internal::rml::private_worker::start_shutdown+0x6f tbb_debug!tbb::internal::rml::private_server::request_close_connection+0x37 tbb_debug!tbb::internal::market::release+0x92 tbb_debug!tbb::internal::market::try_destroy_arena+0xaa tbb_debug!tbb::internal::arena::on_thread_leaving<1>+0x6c tbb_debug!tbb::internal::generic_scheduler::cleanup_master+0x261 tbb_debug!tbb::internal::governor::terminate_scheduler+0x54 tbb_debug!tbb::task_scheduler_init::terminate+0xac DllName_66ed0000!tbb::task_scheduler_init::~task_scheduler_init+0x1e DllName_66ed0000!tbb::task_scheduler_init::`scalar deleting destructor'+0x16 DllName_66ed0000!DllMain+0x9f DllName_66ed0000!__DllMainCRTStartup+0xcd DllName_66ed0000!_DllMainCRTStartup+0x21 ntdll!RtlQueryEnvironmentVariable+0x241 ntdll!LdrShutdownProcess+0x141 ntdll!RtlExitUserProcess+0x74 kernel32!ExitProcess+0x15 DelphiExe!_enc$textbss$begin+0x6967 Following the disassembly code at the point where the binary semaphore is Verhoogd, it calls: _RtlReleaseSRWLockExclusive (that's what __TBB_release_binsem is solved to) +_RtlpWakeSRWLock ++_NtReleaseKeyedEvent +++_NtSetInformationThread I cannot continue debugging once it reaches _NtSetInformationThread, and if I pause the process then it says it's stuck at _NtReleaseKeyedEvent. I fear that may not be enough information to solve the issue, though.
0 Kudos
Anton_M_Intel
Employee
1,820 Views
Another question, does it deadlock in debug configuration only?
0 Kudos
P_A__Jimenez
Beginner
1,820 Views
Anton Malakhov (Intel) wrote:

Another question, does it deadlock in debug configuration only?

I just tested, and it deadlocks in release mode as well.
0 Kudos
Anton_M_Intel
Employee
1,820 Views
Hi, we cannot reliably observe the problem in our tests, thus would appreciate your help in making a reproducer.. Meanwhile, we will disable the usage of SRWLocks in the next Update 1 release. We hope, it will help until the root cause is identified and fixed.
0 Kudos
P_A__Jimenez
Beginner
1,820 Views
Hi.
Anton Malakhov (Intel) wrote:
we cannot reliably observe the problem in our tests, thus would appreciate your help in making a reproducer..
I have added an issue report at the Premier Support website with the same title as this topic, in which there is a way to reproduce it. I would rather not put the link information to download the application here in the forums.
Anton Malakhov (Intel) wrote:
Meanwhile, we will disable the usage of SRWLocks in the next Update 1 release. We hope, it will help until the root cause is identified and fixed.
I will be checking it then once the new version is released. Regards, Paúl.
0 Kudos
Wooyoung_K_Intel
Employee
1,820 Views
Hi,
binksoftware wrote:

I also tried compiling the DLL with TBB 4.1, but using the TBB 4.0 Update 5 DLL (tbb_debug.dll). It is even worse as the application hangs during normal use. It was a long shot, and it had all chances of failing anyway.

I tried to reproudce the hang you mentioned here by compiling the TBB tests with 4.1 headers and running them with 4.0U5 DLL. Unfortunately, I was not able to reproduce any hang. Could you give us a bit more information regarding the nature of your application? In particular, we would like to know what TBB constructs the application uses. That will help us understand the cause of the problem: whether the problem is due to the API changes from 4.0U5 to 4.1 or it is due to a bug in the TBB release and the TBB tests need be extended. Thank you very much
0 Kudos
P_A__Jimenez
Beginner
1,820 Views
Hi.
Wooyoung Kim (Intel) wrote:
I tried to reproudce the hang you mentioned here by compiling the TBB tests with 4.1 headers and running them with 4.0U5 DLL.
Unfortunately, I was not able to reproduce any hang. Could you give us a bit more information regarding the nature of your application?
In particular, we would like to know what TBB constructs the application uses. That will help us understand the cause of the problem: whether the problem is due to the API changes from 4.0U5 to 4.1 or it is due to a bug in the TBB release and the TBB tests need be extended.

Thank you very much

The program uses tbb::atomic (mostly for reference counting), tbb::spin_mutex for (occasional) locking and the parallel_for and parallel_do template functions. The program also uses the scalable_malloc and scalable_free functions for memory allocation of objects and containers. I don't recall using anything else. Regards, Paúl.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,820 Views
Are you explicitly deleting the task_scheduler_init object (prior to termination/exit of all tasks)? IOW you have some (exception) condition (or convergence) and you wish to stop execution and chose to do so via "delete" on the task_scheduler_init object Jim Dempsey
0 Kudos
P_A__Jimenez
Beginner
1,820 Views
jimdempseyatthecove wrote:

Are you explicitly deleting the task_scheduler_init object (prior to termination/exit of all tasks)?
IOW you have some (exception) condition (or convergence) and you wish to stop execution and chose to do so via "delete" on the task_scheduler_init object

Jim Dempsey

The task_scheduler_init object is explicitly deleted, but only when unloading the COM DLL. No TBB operations are being performed at this point, and all objects that were created using the DLL should have been released. IOW the COM DLL is only released when the program is closed, and only then. Regards, Paúl.
0 Kudos
Wooyoung_K_Intel
Employee
1,820 Views
Thanks a lot!! Does your reproducer show this kind of issues, too? I.e., if I run it with 4.0U5 DLL, does it hang in the middle of execution? I have looked at the changes log and the diffs for those TBB constructrs between 4.0U5 and 4.1, but it does not seem obvious what changes might have caused the issue. If your previous reproducer does not have the issue, would you mind giving us another small reproducer?
binksoftware wrote:

Hi.

Quote:

Wooyoung Kim (Intel) wrote:I tried to reproudce the hang you mentioned here by compiling the TBB tests with 4.1 headers and running them with 4.0U5 DLL.
Unfortunately, I was not able to reproduce any hang. Could you give us a bit more information regarding the nature of your application?
In particular, we would like to know what TBB constructs the application uses. That will help us understand the cause of the problem: whether the problem is due to the API changes from 4.0U5 to 4.1 or it is due to a bug in the TBB release and the TBB tests need be extended.

Thank you very much

The program uses tbb::atomic (mostly for reference counting), tbb::spin_mutex for (occasional) locking and the parallel_for and parallel_do template functions. The program also uses the scalable_malloc and scalable_free functions for memory allocation of objects and containers. I don't recall using anything else.

Regards,

Paúl.

0 Kudos
P_A__Jimenez
Beginner
1,820 Views
Wooyoung Kim (Intel) wrote:

Thanks a lot!!
Does your reproducer show this kind of issues, too? I.e., if I run it with 4.0U5 DLL, does it hang in the middle of execution?
I have looked at the changes log and the diffs for those TBB constructrs between 4.0U5 and 4.1, but it does not seem obvious
what changes might have caused the issue. If your previous reproducer does not have the issue, would you mind giving us another small reproducer?

The reproducer I provided in Premier Support is the only reproducer I have (which happens to be the full application). That one was compiled with TBB 4.1, and by just replacing the DLL with that of TBB 4.0U5 I had the early hang. I don't really think swapping DLLs is the way to go, though. Regards, Paúl.
0 Kudos
P_A__Jimenez
Beginner
1,820 Views

In case someone else is interested, Intel reverted the code that introduced the issue. It is available in TBB 4.1 Update 1. It's not part of the release notes, though.

Thanks again for the fix.

0 Kudos
Reply