OpenMP hangs on pthread_cond_wait

Javier_T_ · ‎07-21-2015

Hello,

we are trying to speed up a parallel program using the Intel Compiler and the OpenMP library.

We have observed that the program hangs after running ok for 3-4 days, in one of the parallel loops. The binary keeps running but stays in a permanent waiting state. Here is the gdb stack trace:

#0 0x00007fefbc1bf705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007fefbc98e9ce in __kmp_suspend_template (th_gtid=, flag=) at ../../src/z_Linux_util.c:1819
#2 __kmp_suspend_64 (th_gtid=-1145176444, flag=0x80) at ../../src/z_Linux_util.c:1874
#3 0x00007fefbc92fe08 in suspend (this=, th_gtid=) at ../../src/kmp_wait_release.h:405
#4 __kmp_wait_template (this_thr=, flag=, final_spin=, itt_sync_obj=) at ../../src/kmp_wait_release.h:224
#5 wait (this=, this_thr=, final_spin=, itt_sync_obj=) at ../../src/kmp_wait_release.h:414
#6 __kmp_hyper_barrier_gather (bt=3149790852, this_thr=0x80, gtid=1, tid=-1, reduce=0x7fefbbbdfe00, itt_sync_obj=0x0) at ../../src/kmp_barrier.cpp:510
#7 0x00007fefbc9331c3 in __kmp_join_barrier (gtid=-1145176444) at ../../src/kmp_barrier.cpp:1364
#8 0x00007fefbc959ee2 in __kmp_internal_join (id=0x7fefbbbdfe84, gtid=128, team=0x1) at ../../src/kmp_runtime.c:7142
#9 0x00007fefbc9609a4 in __kmp_join_call (loc=0x7fefbbbdfe84, gtid=128, exit_teams=1) at ../../src/kmp_runtime.c:2322
#10 0x00007fefbc9345bd in __kmpc_fork_call (loc=0x7fefbbbdfe84, argc=128, microtask=0x1) at ../../src/kmp_csupport.c:326
#11 0x000000000045f4fb in CovarianceMatrixCxx::kalman (......) at G2/CovarianceMatrixCxx.cpp:295

So far, we have observed this problem using Scientfic Linux 7 (glibc 2.17), but not using Scientific Linux 6 (glibc 2.12), when using "icc -openmp". The behaviour is the same when using "icc -fopenmp".

When we use the GNU C++ Compiler with OpenMP (g++ -fopenmp), the program runs fine in both SL6 and SL7.

In summary, this looks like a problem when combining the Intel Compiler and SL7, when the OpenMP library is used. This is the ICC version we are using:

> icc --version

icc (ICC) 15.0.1 20141023

Unfortunately, the program is quite complex and we have not been able to generate a simplified version of the problem that can be easily reproduced. Below is an snapshoot of the loop that is hanging.

Are you aware of any kind of problem similar to this one?

Any help is appreciated,

Javier

#pragma omp parallel for private(i,j,k)
for(i=0; i
{
v_temp = 0.0;

for(k=0; k
j = index_OfNonZero;

if(j >= i) {
v_temp += v_Matrix*v_KalmanVec.d_A;
} else {
v_temp += v_Matrix*v_KalmanVec.d_A;
}

}//for k
}//for i

Martyn_C_Intel · ‎09-09-2015

Hi,

Are you able to tell us how many times your parallel region has been called, when the application hangs?

We have recently heard of an issue that sounds somewhat similar to this one. In a small test case to reproduce the issue, it went away when using the new version 16.0 compiler on Red Hat EL 7. It does look rather like an interaction between the OpenMP run-time library and the environment, most likely glibc, but the behavior is inconsistent and it's not yet clear where the problem originates. Investigations are ongoing.

The version 16.0 compiler is part of Intel Parallel Studio XE 2016, available for download from https://registrationcenter.intel.com with current support; you might like to give it a try.

Javier_T_ · ‎09-09-2015

Hello, thanks for the feedback.

The parallel loop is in a function that is called more than 250 000 times before it hangs, but it is not easy to reproduce as we have to wait for a long time before it happens.

jimdempseyatthecove · ‎09-10-2015

Martyn,

Several years ago while I was writing my QuickThread threading toolkit I ran across a similar anomaly where pthread_cond_wait would occasionally not resume when a different thread issues pthread_cond_signal. After extensive investigation of this issue with reproducers, this lead me to assume (conclude) that my version of the pthread library had a race condition between pthread_cond_wait and pthread_cond_signal. My recourse was to use pthread_cond_timedwait, then test other atomic counters managed by my wrapper functions. As to if this (assumed) race condition still exists, I cannot say, however it would not hurt to use a timed wait.

Jim Dempsey

Martyn_C_Intel · ‎09-11-2015

This seems more likely caused by use of a 32 bit variable where 64 bits are needed if a parallel region is called very many times.

One such case was found and fixed in the 16.0 compiler/RTL, but it's not excluded that there could be more.

jimdempseyatthecove · ‎09-12-2015

In looking at Javier's sample code, it would appear that the issue would be either inside your libiomp5... ???!!!

*** in looking at Javier's stack dump, it appears that he is not using libiomp5.so

Javier, what are you linking in for the OpenMP support?

Jim Dempsey

Javier_T_ · ‎09-14-2015

Hi, thanks for feedback.

We are linking with -fopenmp and -openmp, same (bad) result with both. That's why I think there is something glibc-related.

Martyn, could you be a bit more specifi on the 32 vs 64 bits variables. We just use integer and float variables in those loops.

Regards,
Javier

jimdempseyatthecove · ‎09-14-2015

Javier,

It is not the linker switch (-openmp), it is what library is being linked in (this may be a path search or environment issue). The reason I have for this suspicion is your stack dump contains:

...in __kmp_join_barrier (gtid=-1145176444) at ../../src/kmp_barrier.cpp:1364

The above indicates that the "kmp" library used was compiled with debug information including source file name and line number. As far as I know, the Intel distributed libraries are not built with debug information. Therefore, whatever you are linking is likely not the Intel .so or .a library.

Martyn, can you comment on my suspicion?

Jim Dempsey

AGG1 · ‎09-14-2015

Hi Javier,

We are having similar issues. We have a Fortran code using OpenMP threads (using Intel Fortran 15). We decided to try out Intel Fortran 16 compiler and this seems to have resolved the issue (we're still testing but I'm optimistic). Were you able to fix yours as well?

Thanks.

Vladimir_P_1234567890 · ‎09-15-2015

jimdempseyatthecove wrote:

...in __kmp_join_barrier (gtid=-1145176444) at ../../src/kmp_barrier.cpp:1364

The above indicates that the "kmp" library used was compiled with debug information including source file name and line number. As far as I know, the Intel distributed libraries are not built with debug information. Therefore, whatever you are linking is likely not the Intel .so or .a library.

Debug info is located in libiomp5.dbg file which is next to libiomp5.so file. So in case you are running your program under compiler environment openmp debug info is available. When you ship libiomp5.so to your customers then debug info is not available.

--Vladimir

Vladimir_P_1234567890 · ‎09-15-2015

Hello all,

Could you provide a bit more info pls? Version-Relese number (libc/kernel version) and what other threads are doing ?

This issues might be related to kernel or pthread bugs reported against RHEL, for example

https://bugzilla.redhat.com/show_bug.cgi?id=750419

Thank you,
--Vladimir

AGG1 · ‎09-15-2015

Hi Vladimir,

I'm running on CentOS 6.6 2.6.32-504.1.3.el6.x86_64. The libc version is libc-2.12.so. In our case the process is stuck on line: !$omp parallel do schedule(static,1), just before spawning off threads.

I made sure the executable was pointing to the correct libiomp5.so. It gets stuck after going through the same code about 18 million times. But it gets stuck on the same line every time I run the code. When we switched over to Intel Fortran 16 the issue went away.

The issue also happens on Windows for us. We're still in the process of testing with Intel Fortran 16 on Windows and will update this thread when I know more.

Thanks.

AGG1 · ‎09-15-2015

Hi Valdimir,

I saw a similar issue being discussed here, where you say you guys are working on a fix. Is this fix present in Intel Fortran 16?

https://software.intel.com/en-us/forums/intel-c-compiler/topic/560358

Thanks!

AGG1 · ‎09-15-2015

We are getting stuck on Windows at the same place even with Intel Fortran 16.

Martyn_C_Intel · ‎09-15-2015

Hi,

I had some difficulty in posting this reply yesterday, to Jim's post of Monday morning.

I agree with Jim. -fopenmp is just a synonym for -openmp or -qopenmp, provided for compatibility with gcc syntax. There should be no difference in functionality. You can see the version number of the OpenMP run-time library that is initialized by setting the KMP_VERSION environment variable to yes before running your program.

When I mentioned 32 and 64 bit variables, I was referring to variables inside the OpenMP run-time library, not your code. One instance where a 32 bit variable needed to be 64 bits was found in the OpenMP library shipped with the version 15 compiler. This has been fixed in the 16.0 compiler.

I am finding that the version 16 compiler resolves the problem seem with 15.0, and the test cases now run correctly, on every Linux platform that I have tried but one. On this one platform, the problem remains even when the statically linked executable and all the major GNU shared libraries (libm, libc, libpthread, libgcc_s, libdl) are copied over from a system on which the executable runs correctly. I’m at a loss to know what could be different here. The kernel is about all that is left, it is 2.6.32-220.el6.x86_64 on the system that fails, 2.6.32-358.el6.x86_64 on the very similar one that works.

Javier, please can you try with the version 16 compiler? I think there is a high likelihood that will solve your problem, as it has for other users.

Up to now, I have not tested on Windows, but I will do so. I see there's a thread in the Windows forum.

Martyn_C_Intel · ‎09-15-2015

I reran similar tests on my Windows 7 laptop. I was able to reproduce the hang with the version 15 compiler, but not with 16.0. This for both C and Fortran test cases.

The developers confirmed that the fix that was made to the OpenMP RTL in 16.0 was made for both Linux and Windows. Incidentally, this fix should also be present in the 15.0 update compiler that will be posted very soon.

AGG, please can you tell us more about the environment in which you see a problem with the 16.0 compiler? E.g. exact version of Windows; static or dynamic linking (I think OpenMP is always linked dynamically on Windows), the OpenMP RTL version as printed out when you set KMP_VERSION=yes, what sort of system you are running on (sockets, cores, threads, microarchitecture, 64 bit)? Better still if you can provide a test case to reproduce the problem.

Michael_Barkhudarov · ‎09-15-2015

Martyn, glad you were able to reproduce the problem. Unfortunately, I cannot easily create a test case for you since the FORTRAN code is massive. Could I try your test case instead?

The 15.0.4 compiler did not work on either Linux (see AGG's post above for the flavor) or Windows.

Setting KMP_VERSION=yes produces the expected result on the Linux machine, where the 16th version works, but does not give any additional output on my Windows machine. We are linking dynamically using /MD and /Qopenmp options. When I type 'which libiomp5md.dll' is gives

C:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2016.0.110/windows/redist/intel64/compiler/libiomp5md.dll

The Windows machine is running 64-bit version of Windows 7 Professional SP 1. OpenMP linked dynamically. The version of libiomp5md.dll is 20150609 (get it by running filever on it), same as on the Linux machine where it works.

The processor is i7-4930 3.4 GHz, one socket, 6 real cores, hyperthreaded (my problem fails both on 12 and 4 threads; have not tried other combinations). Let me know if you need more info.

I also attach an image from the dependency walker to show the dynamic libs the executable (hydr3d.exe) depends on, including the openmp library.

Vladimir_P_1234567890 · ‎09-15-2015

AGG wrote:

I saw a similar issue being discussed here, where you say you guys are working on a fix. Is this fix present in Intel Fortran 16?

https://software.intel.com/en-us/forums/intel-c-compiler/topic/560358

That was a stable library issue, it has been fixed in the version 16.0 of compilers.

In this case we see the folating issue that depends on machine configuration and OS version.

--Vladimir

Javier_T_ · ‎09-16-2015

All, thank you for the feedback.

We have tried icc 16.0 but we are having numerical problems with this compiler version, using the same compilation options as with 15.0. We are investigating this new problem, but I cannot say if it solves the hanging issue yet...

Regards,

Javier

Vladimir_P_1234567890 · ‎09-24-2015

The problem was triaged and partially fixed in 15.0 update 5 and 16.0 compilers. We are working on the full fix.

If the code still hangs please use the environment variable:

KMP_BLOCKTIME=infinite

--Vladimir

Martyn_C_Intel · ‎12-01-2015

An additional instance of incorrect handling of a 64 bit variable has been identified in the run-time library. This has been fixed in the latest version of the compiler, 16.0 update 1 for Linux (16.0.1.150). This compiler is available for download from the Intel Registration Center as part of Intel Parallel Studio XE 2016 update 1, posted 12 Nov 2015. I have confirmed that this compiler resolves the problem with my test case running on RHEL 6.5 with glibc 2.12. The 16.0 update 1 compiler for Windows contains the equivalent fix for a similar problem on Windows.

We believe this resolves the reported problems, but please let us know if you see any further issues.