Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Problem with OpenMP Locks

Farooqi__Muhammad_Nu
1,840 Views
I encounter a segmentation fault when I compile the following code with the intel compiler.
The code works fine with gnu compiler. What could be the possible reason behind this error.
The error occur at locking or unlocking but mostly occur at unlocking the openmp lock.
 
#include <omp.h>
#include <iostream>

int main (int argc, char* argv[])
{
  int x, y, z;
  x = 0;  y = 0;  z = 0;
  double N = 60.0;

  omp_lock_t lck;
  omp_init_lock (&lck);

#pragma omp parallel shared(lck) default(shared)
  {
    double stime = omp_get_wtime();
    double etime = omp_get_wtime();

    int tid = omp_get_thread_num();
    
    while((etime - stime) < N)
      {
	if(tid == 2)
	  {
	    omp_set_lock (&lck);
	    y++;
	    omp_unset_lock (&lck);
	  }
	
	if(tid == 0)
	  {	
	    omp_test_lock (&lck);
	    z--;
	    omp_unset_lock (&lck);	    
	  }
	etime = omp_get_wtime();
      }
  }
  omp_destroy_lock (&lck);
  return 0;
}

 

0 Kudos
18 Replies
Viet_H_Intel
Moderator
1,840 Views

 

Thanks for reporting this issue. I was able to reproduce it with ICC 18.0.1; this seems to be a known issue and has been addressed. Hence, in the next 18.0 update you won't encounter this issue anymore.

with 18.0.1:

$ icpc t2.cpp  -qopenmp
vahoang@orcsle147:/tmp$ ./a.out

Segmentation fault (core dumped)

With our internal compiler testing:

$  icpc t2.cpp -c -qopenmp
$ ./a.out
$

 

Regards,

Viet 

0 Kudos
Farooqi__Muhammad_Nu
1,840 Views

Thank you for your reply.

I am working on a project and stuck with this error. I posted this as a sample code, our project uses this locking strategy. Is there any work around until the update arrives?

What is the expected date for the next update?

 
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,840 Views

Does your real code do anything different than the above sample code?

IOW - is it doing something that can be better performed with an

#pragma omp atomic
   ++y;

Jim Dempsey

0 Kudos
Farooqi__Muhammad_Nu
1,840 Views

No, my real code is about 4K LOC and inside a locked region it performs operations spawning 10~20 LOC. A simple atomic operation is not possible.

0 Kudos
Vladimir_P_1234567890
1,840 Views

do I understand correctly that you are unlocking the lock set by thread with tid=2 by thread with tid=0 and then doing second unlock by thread with tid=2 without locking? 

Isn't this call non-comforming by the spec?

chapter 3.3.5

A program that accesses a lock that is not in the locked state or that is not owned by the task that
contains the call through either routine is non-conforming.

--Vladimir

0 Kudos
Farooqi__Muhammad_Nu
1,840 Views

Yes, in the sample code it is non-conforming but in my real code I check if the lock is obtain using the test and is as follows:

            if(omp_test_lock (&lck) !=0 )
              {
                z--;
                omp_unset_lock (&lck);
              }

In my real code, I don't encounter this error for low number of processes. But when I run on like 1024 KNL nodes with 4 MPI processes in each node some of the processes faces the same error that i encounter in this sample code.

0 Kudos
Viet_H_Intel
Moderator
1,840 Views

 

We looked at your sample code a bit more and it looks like the problem is in the code itself.

If omp_test_lock() returns false (that is the lock could not be acquired because it was owned by the other thread) then the result of the following omp_unset_lock() is undefined (caused crash with icpc).

while((etime - stime) < N)

      {

        if(tid == 2)

          {

            omp_set_lock (&lck);

            y++;

            omp_unset_lock (&lck);

          }

 

        if(tid == 0)

          {

            omp_test_lock (&lck);

            z--;

            omp_unset_lock (&lck);

          }

        etime = omp_get_wtime();

      }

A simple fix like the following would resolve the issue.

$ diff t2.cpp t2.cpp.orig

31,36c31,33

<             int res = omp_test_lock (&lck);

<             if(res)

<           {

<                 z--;

<                 omp_unset_lock (&lck);

<           }

---

>             omp_test_lock (&lck);

>             z--;

>             omp_unset_lock (&lck);

0 Kudos
Olga_M_Intel
Employee
1,840 Views

Farooqi, Muhammad Nufail wrote:

In my real code, I don't encounter this error for low number of processes. But when I run on like 1024 KNL nodes with 4 MPI processes in each node some of the processes faces the same error that i encounter in this sample code.

If the real code is correct it can't be "the same error" as one got with incorrect code. So, the problem seems to be different.

Probably, there are no enough resources. Do you have stack size large enough, for example?

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,840 Views

Viet,

Your concern was answered by Farooqi's post #7 (unset issued inside true condition of set).

Farooqi should have presented a correct sketch code in post #1 as his reproducer.

Perhaps Farooqi can construct and post a revised reproducer.

Jim Dempsey

0 Kudos
Farooqi__Muhammad_Nu
1,840 Views

Jim,

The correct sketch code do not produce the error. But the original code do produce an error when executed with 1024 KNL nodes (4096 MPI processes). The trace back indicates problem with openmp test or unset subroutines.

The codes runs fine when executed with with less number of processes. The code also runs fine using any number of nodes if compiled with gnu or cray compilers.

Is there any way to debug and locate error inside libiomp?

I also wonder why the incorrect code at post # 1 do not produce an error with gnu and newer intel compiler(post#2).

A backtrace (trace starting at bottom and going upwards) from the original code is as follows:

 8: /lib64/libc.so.6(+0x34950) [0x2aaaad865950]
    ??
    ??:0
 9: /opt/intel/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64/libiomp5.so(+0x7f71c) [0x2aaaad2f571c]
    ??
    ??:0
10: /opt/intel/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64/libiomp5.so(__kmpc_test_lock+0x7c) [0x2aaaad2d42dc]
    ??
    ??:0
11: /global/u1/m/mfarooqi/castro_runs_fix/strong_scaling_knl64/1024_script/./Castro3d.knl.MPI.OMP.ex.1t.68t() [0x4116f1]
    Perilla::serviceLocalRequests(RegionGraph*, int)
    /global/u1/m/mfarooqi/amrex_perilla_castro_knl/perilla_cpp/Perilla.cpp:917
12: /global/u1/m/mfarooqi/castro_runs_fix/strong_scaling_knl64/1024_script/./Castro3d.knl.MPI.OMP.ex.1t.68t() [0x412099]
    Perilla::serviceMultipleGraphCommDynamic(std::vector<RegionGraph*, std::allocator<RegionGraph*> >, bool, int)
    /global/u1/m/mfarooqi/amrex_perilla_castro_knl/perilla_cpp/Perilla.cpp:1343
13: /global/u1/m/mfarooqi/castro_runs_fix/strong_scaling_knl64/1024_script/./Castro3d.knl.MPI.OMP.ex.1t.68t() [0x5cc6d6]
    amrex::Amr::coarseTimeStep(double)
    /global/u1/m/mfarooqi/amrex_perilla_castro_knl/amrex/Src/Amr/AMReX_Amr.cpp:2045
14: /opt/intel/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64/libiomp5.so(__kmp_invoke_microtask+0x93) [0x2aaaad32bac3]
    ??
    ??:0
15: /opt/intel/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64/libiomp5.so(+0x84257) [0x2aaaad2fa257]
    ??
    ??:0
16: /opt/intel/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64/libiomp5.so(+0x838d5) [0x2aaaad2f98d5]
    ??
    ??:0
17: /opt/intel/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64/libiomp5.so(+0xb5fa4) [0x2aaaad32bfa4]
    ??
    ??:0
18: /lib64/libpthread.so.0(+0x8744) [0x2aaaacd64744]
    ??
    ??:0
19: /lib64/libc.so.6(clone+0x6d) [0x2aaaad91aaad]
    ??
    ??:0

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,840 Views

>>But the original code do produce an error when executed with 1024 KNL nodes (4096 MPI processes).

The OpenMP omp_lock scheme is for intra-process (multi-threaded within single process) coding. It is not designed for inter-process (multi-process application regardless of threading via OpenMP or other thread model or lack thereof).

OpenMPI on the other hand has a different synchronization construct for use with Remote Memory Access (RMA). In MPI 3.1.pdf, section 11.5, you will find information about inter-process locking.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,840 Views

Additional information.

It is not unusual for an application, in particular an MPI application, to have multiple threads within a process. As a sub-set of this, it is also not unusual for the process to have a mix of non-OpenMP and OpenMP threads. You should be made aware that should you instantiate non-OpenMP threads in your process, .AND. if more than one of these threads (including main thread) enters an OpenMP parallel region, then each such occurrence instantiates a separate OpenMP thread pool. IOW each will have thread IDs in the range of 0:nThreads-1. If this be true, then a non-lock-owning thread with the same ID as in a different OpenMP thread pool that owns the lock, may inadvertently issue a non-conforming lock/unlock operation.

While the (not quite correct) code shown in post #1 does not illustrate this potential problem, your actual code may have this issue.

Jim Dempsey

0 Kudos
Olga_M_Intel
Employee
1,840 Views

Farooqi, Muhammad Nufail wrote:

Is there any way to debug and locate error inside libiomp?

To debug OpenMP runtime you can take LLVM OpenMP from http://openmp.llvm.org/ and build it in debug mode. Then link it with your application and you will probably get more information from backtrace. Also you can use internal tracing feature in the debug OpenMP library. Tracing is enabled by setting KMP_{A,B,C,D,E,F}_DEBUG=<n> environment variable. If you wish to try it I can give you more info about tracing.

Farooqi, Muhammad Nufail wrote:

I also wonder why the incorrect code at post # 1 do not produce an error with gnu and newer intel compiler(post#2).

Do you mean that you see the crash with some old ICC and don't see with the newer version?

Which versions are they (old and new)? Actually it is better to provide OpenMP library version that is used.

To get this info please set KMP_VERSION=1 before running the app. And provide the output here. It would be something like the following

Intel(R) OMP Copyright (C) 1997-2017, Intel Corporation. All Rights Reserved.
Intel(R) OMP version: 5.0.20170829
Intel(R) OMP library type: performance
Intel(R) OMP link type: dynamic
Intel(R) OMP build time: 2017-09-09 15:52:48 UTC
Intel(R) OMP build compiler: Intel(R) C++ Compiler 16.0
Intel(R) OMP alternative compiler support: yes
Intel(R) OMP API version: 5.0 (201611)
Intel(R) OMP dynamic error checking: no
Intel(R) OMP thread affinity support: not used
 

Also, please set one more environment variable KMP_SETTINGS=1 and provide output here.

Thank you.

 

0 Kudos
Farooqi__Muhammad_Nu
1,840 Views

To debug OpenMP runtime you can take LLVM OpenMP from http://openmp.llvm.org/ and build it in debug mode. Then link it with your application and you will probably get more information from backtrace. Also you can use internal tracing feature in the debug OpenMP library. Tracing is enabled by setting KMP_{A,B,C,D,E,F}_DEBUG=<n> environment variable. If you wish to try it I can give you more info about tracing.

I tried to link llvm OpenMP but the compiler do no override the intel's default OpenMP library. Can you tell me the compiler option to use llvm OpenMP instead of the Intel's default libiomp?

Yes, I want to try tracing, A detailed info about it would be nice.

Do you mean that you see the crash with some old ICC and don't see with the newer version?

Which versions are they (old and new)? Actually it is better to provide OpenMP library version that is used.

I was referring the reply from Viet in post # 2. The code produce error with latest released intel compiler but not with the internal compiler (not released yet) test.

0 Kudos
Olga_M_Intel
Employee
1,840 Views

To use different libiomp at runtime set LD_LIBRARY_PATH=<path to the new libiomp> before running the app.

You can check that the correct libiomp is used by setting KMP_VERSION=1 (as I mentioned above).

So, for example, I see the following for my test:

1. Check the version of libiomp the app was initially linked

$ ldd ./test.exe
        linux-vdso.so.1 =>  (0x00007fff67346000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fd964120000)
        libiomp5.so => /opt/intel/compilers_and_libraries_2017.1.132/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64/libiomp5.so (0x00007fd963d7c000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fd963b66000)
 (...)
$ KMP_VERSION=1 ./test.exe
Intel(R) OMP Copyright (C) 1997-2017, Intel Corporation. All Rights Reserved.
Intel(R) OMP version: 5.0.20170308
Intel(R) OMP library type: performance
Intel(R) OMP link type: dynamic
Intel(R) OMP build time: 2017-03-08 20:42:20 UTC
Intel(R) OMP build compiler: Intel C++ Compiler 15.0
Intel(R) OMP alternative compiler support: yes
Intel(R) OMP API version: 4.5 (201511)
Intel(R) OMP dynamic error checking: no
Intel(R) OMP thread affinity support: not used

- that is this is the libiomp from the Intel compiler 17.0.4

2. Use LD_LIBRARY_PATH

 $ LD_LIBRARY_PATH=/home/LLVM-openmp/build.clang/runtime/src:$LD_LIBRARY_PATH KMP_VERSION=1 ldd  ./test.exe
        linux-vdso.so.1 =>  (0x00007fff775fe000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f4c46bd8000)
        libiomp5.so => /home/LLVM-openmp/build.clang/runtime/src/libiomp5.so (0x00007f4c46926000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f4c46710000)
 (...)

 $ LD_LIBRARY_PATH=/home/LLVM-openmp/build.clang/runtime/src:$LD_LIBRARY_PATH KMP_VERSION=1  ./test.exe
Intel(R) OMP version: 5.0.20171019
Intel(R) OMP library type: performance
Intel(R) OMP link type: dynamic
Intel(R) OMP build time: no_timestamp
Intel(R) OMP build compiler: Clang 6.0

Intel(R) OMP alternative compiler support: yes
Intel(R) OMP API version: 5.0 (201611)
Intel(R) OMP dynamic error checking: no
Intel(R) OMP thread affinity support: not used

- this is the library built with clang from llvm openmp.
 

0 Kudos
Olga_M_Intel
Employee
1,840 Views

Farooqi, Muhammad Nufail wrote:

I was referring the reply from Viet in post # 2. The code produce error with latest released intel compiler but not with the internal compiler (not released yet) test.

That reply was sent before it was recognized your reproducer code was incorrect. So, the statement in the reply is not correct until you can verify and confirm that your application crashes with one ICC version and works well with another.

For debugging purpose it would be still better to have a small reproducer that can be run on one node.

0 Kudos
Farooqi__Muhammad_Nu
1,840 Views

My problem in the original code is solved by using the latest OpenMP library from LLVM.

I hope the issue will be fixed in the Intel compiler's next update as well.

Thank you all for your suggestions.

0 Kudos
Olga_M_Intel
Employee
1,840 Views

Farooqi, Muhammad Nufail wrote:

My problem in the original code is solved by using the latest OpenMP library from LLVM.

I hope the issue will be fixed in the Intel compiler's next update as well.

Thank you all for your suggestions.

Thanks for checking it with LLVM OpenMP!

This probably relates to the issue https://bugs.llvm.org/show_bug.cgi?id=34050 that was partially fixed with https://reviews.llvm.org/rL317115

 

0 Kudos
Reply