I have an application which uses openmp. It is compiled (using Intel Composer XE 2017 update 5) for Suse but runs on Ubuntu Linux with real time extensions. Everything works fine but when the program is exiting from main, it hangs and never returns. I have a call stack of the application using GDB and it is below. As can be seen it looks like openmp is waiting on some lock and it forever hangs there. Any specific reason for the hang or some way to bypass this?
I do not see this problem when running in non-realtime systems. The one difference between realtime and non-realtime setup of openmp is that in case of realtime, I set the KMP_BLOCKTIME to 0 before the first call to openmp. In non-realtime case it is set to 100. Not sure if it has anything to do with the hang though.
0x00007f36c078382c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#0 0x00007f36c078382c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f36c077f17c in _L_lock_982 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f36c077efcb in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f36c0c64241 in __kmp_resume_64 ()
#4 0x00007f36c0c5ae34 in __kmp_release_64 ()
#5 0x00007f36c0c31b43 in __kmp_internal_end_library ()
#6 0x00007f36c4b2cf67 in ?? () from /lib64/ld-linux-x86-64.so.2
#7 0x00007f36c03e9121 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#8 0x00007f36c03e91a5 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#9 0x00007f36c03ceeac in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#10 0x0000000000408be9 in _start () at ../sysdeps/x86_64/elf/start.S:113
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Parallel Computing
The parallel regions have an implicit barrier at the end of the region. As a test to verify that all threads of the last region have completed, insert a printf or other output statement following that region.
Note, should your application spawn one or more additional non-Openmp thread(s), and should any of those threads use OpenMP, then each such thread will instantiate its own OpenMP thread pool. IOW in effect the application will appear to have multiple main threads (from the OpenMP perspective). Should one of those "main" threads perform an exit (or return from main(...)), the other "main" threads (non-OpenMP threads containing OpenMP parallel regions), must shut down in an orderly manner first (and exit any currently running OpenMP parallel region).
Please verify an orderly shutdown should this be the case.
Thanks for the notes. I managed to figure out the issue. From within the application, the openmp threads were being cancelled explicitly by issuing a call to pthread_cancel. This in combination with KMP_BLOCKTIME being set to 0 caused the hang.
When I commented out the code that calls pthread_cancel, the process exits properly.
The question I have is when KMP_BLOCKTIME is left as the default value, then the pthread_cancel works and no hang is seen. It is only when it is set to 0, this issue occurs.
Is there any way to explicitly inform the openmp runtime to stop the openmp threads?
We have a scenario where a shared library that uses openmp(in Linux) gets loaded dynamically by an application. The application itself is not dependent on openmp. When the library is unloaded by the application, sometimes we see crashes. After some investigation, this hack of killing the openmp threads using pthread_cancel solved the problem of crashes and everything worked fine until more recently we started setting KMP_BLOCKTIME to 0. This, in combination with the pthread_cancel causes a hang.
Why are you issuing pthread_cancel (presumably to a thread that may have spawned an OpenMP thread pool)?
Note, if your application is spawing a pthread, that pthread calls the shared library, returns, then the thread terminates,...
and then repeats that activity. OpenMP, depending upon implementation, may instantiate and additional OpenMP thread pool for that newly created thread (and repeat creating pools each time you do this). IOW a memory and resource (handle) eating activity.
The better method is to create the worker pthread once, then reuse it on subsequent times it is needed.
In our case openmp threads are spawned off only in the main thread. There are no nested openmp threa pools. The main thread spawns off other pthreads also but these pthreads do not create any new openmp pools.
I am not issuing a pthread_cancel on a normal pthread. I am issuing it on an openmp thread. Something like below.
#pragma omp parallel(4)
int i = omp_get_thread_num();
if (i > 0)
A = pthread_self();
I don't know if we can call a pthread_cancel on openmp threads. Is there any way to destroy the openmp thread pool manually?
In general, you cannot cancel a thread that you (your design) did not specifically create (you can but behavior is undefined by the OpenMP specification). For any given OpenMP implementation, it may, or may not, have an undocumented API to perform an orderly shutdown.
You can slightly change __kmpc_end() function, - replace the call to __kmp_internal_end_thread(-1) with the call to __kmp_internal_end_library(-1) then compile the library. After that you may set KMP_IGNORE_MPPEND environment variable to 0, and call __kmpc_end(NULL); in any place in your application to completely shutdown the OpenMP runtime library. All threads will be destroyed.
Alternatively you can link with static OpenMP runtime (e.g. use -qopenmp-link=static compiler option on Linux), and similarly call __kmpc_end(NULL) in serial code of your application, - no need to change library code in this case. Of cause the KMP_IGNORE_MPPEND should be set to 0.
Please note, the above is not portable, and may not be available in current version (18.nn) of Intel OpenMP
Why do you find it necessary to kill the OpenMP spawned threads?
IOW, after KMPBLOCKTIME expires following the last parallel region executed, the threads will suspend, presumably on a pthread condition variable and should consume no CPU resources other than some Virtual Memory in your process virtual address space.
If you are running in 32-bit mode, you might arguably have a case for this, but not if x64.
Thank you. Let me try the static lib approach and see how that goes.
The reason why we are cancelling the openmp threads is a crash which we were facing when unloading a library built with openmp. The application that loads this library itself is "not" built with openmp. Before unloading the library that was built with openmp, we were cancelling the openmp threads and subsequently unloading the library. This prevents the crash. I am not sure if the exact reason for the crash has something to do with openmp but after some investigation and disabling it causes the crash to disappear.
However, this causes another problem of the hang as discussed above when the KMP_BLOCKTIME is set to 0.
The (a) problem with directly canceling the OpenMP task pool threads is the instantiation of the OpenMP task pool may (may stressed) create one or more additional watchdog/monitoring threads which will not be canceled by your code. If the shared library is not written to take this (undue external interference) into consideration, the unloading of the shared library may occur while OpenMP watchdog/monitoring threads are actively running in the code space of the library. These threads may continue to run without a problem... up until the point were that memory gets re-used, overwritten, or (re/un)-mapped in the virtual address space.
IIF OpenMP instantiates watchdog/monitoring thread(s), and if the back door shutdown functions are not available, then you will have to write code to determine the thread handles of the(se) watchdog/monitoring thread(s). Terminating these threads, possibly in a specific order, and potentially at a specific point in time of its execution (e.g. while sleeping) may correct the crashing. For example of a bad time to terminate, would be if the canceled watchdog/monitoring were in the middle of a malloc or free. Note, in any event, allocated memory by the OpenMP library would not have been returned. IOW you will have a memory leak. This shouldn't be problematic if the OpenMP library is loaded only once or a few times. (load, use, cancel threads)n (n small number of times).
If you cannot assure correctness, then I suggest you explore having your process create a new process that:
Gets input data via command line, pipe, shared memory or file,
runs parallel code,
sends results via pipe, shared memory or file
You still haven't answered why you find it necessary to unload the OpenMP library.
The only potential reason (on x64) would be if you need to run different versions of the library.
Thanks for the explanation Jim.
A third party application loads our library dynamically and at the end of their application they unload it (probably as a best practice). If they do not unload, then there is no crash as expected. I don't think they use different versions of the library (at the moment) but that is a possibility if they want to support different versions of our library with the interface remaining the same. I am not sure of their design reasons.
I found this:
This potentially can be used if you are careful.
Presumably you have your own library that calls upon the OpenMP library. Your library is either static or dynamic.
If static, a static data area can contain an indication as to if the OpenMP library is loaded or not. On init load it if not already loaded. On fini cleanup except for unloading the OpenMP library and keep the static data that indicates library is loaded.
If dynamic, then you have to figure out how to do the equivalent of the code in the static library.
Note, the 3d party application would issue load library on your dynamic library, without knowledge that your library loaded the OpenMP shared library. When the 3d party application issues unload to your library, the fini code remembers OpenMP is loaded.
*** caution. In order for this to work properly, the 3rd party application must have the same thread performing calls to your library. If not, then you will exhibit memory leaks as each different thread instantiates an OpenMP thread pool.