- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
#include <omp.h> #include <iostream> int main (int argc, char* argv[]) { int x, y, z; x = 0; y = 0; z = 0; double N = 60.0; omp_lock_t lck; omp_init_lock (&lck); #pragma omp parallel shared(lck) default(shared) { double stime = omp_get_wtime(); double etime = omp_get_wtime(); int tid = omp_get_thread_num(); while((etime - stime) < N) { if(tid == 2) { omp_set_lock (&lck); y++; omp_unset_lock (&lck); } if(tid == 0) { omp_test_lock (&lck); z--; omp_unset_lock (&lck); } etime = omp_get_wtime(); } } omp_destroy_lock (&lck); return 0; }
- Tags:
- CC++
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Optimization
- Parallel Computing
- Vectorization
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for reporting this issue. I was able to reproduce it with ICC 18.0.1; this seems to be a known issue and has been addressed. Hence, in the next 18.0 update you won't encounter this issue anymore.
with 18.0.1:
$ icpc t2.cpp -qopenmp
vahoang@orcsle147:/tmp$ ./a.out
Segmentation fault (core dumped)
With our internal compiler testing:
$ icpc t2.cpp -c -qopenmp
$ ./a.out
$
Regards,
Viet
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your reply.
I am working on a project and stuck with this error. I posted this as a sample code, our project uses this locking strategy. Is there any work around until the update arrives?
What is the expected date for the next update?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does your real code do anything different than the above sample code?
IOW - is it doing something that can be better performed with an
#pragma omp atomic
++y;
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, my real code is about 4K LOC and inside a locked region it performs operations spawning 10~20 LOC. A simple atomic operation is not possible.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
do I understand correctly that you are unlocking the lock set by thread with tid=2 by thread with tid=0 and then doing second unlock by thread with tid=2 without locking?
Isn't this call non-comforming by the spec?
chapter 3.3.5
A program that accesses a lock that is not in the locked state or that is not owned by the task that
contains the call through either routine is non-conforming.
--Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, in the sample code it is non-conforming but in my real code I check if the lock is obtain using the test and is as follows:
if(omp_test_lock (&lck) !=0 )
{
z--;
omp_unset_lock (&lck);
}
In my real code, I don't encounter this error for low number of processes. But when I run on like 1024 KNL nodes with 4 MPI processes in each node some of the processes faces the same error that i encounter in this sample code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We looked at your sample code a bit more and it looks like the problem is in the code itself.
If omp_test_lock() returns false (that is the lock could not be acquired because it was owned by the other thread) then the result of the following omp_unset_lock() is undefined (caused crash with icpc).
while((etime - stime) < N)
{
if(tid == 2)
{
omp_set_lock (&lck);
y++;
omp_unset_lock (&lck);
}
if(tid == 0)
{
omp_test_lock (&lck);
z--;
omp_unset_lock (&lck);
}
etime = omp_get_wtime();
}
A simple fix like the following would resolve the issue.
$ diff t2.cpp t2.cpp.orig
31,36c31,33
< int res = omp_test_lock (&lck);
< if(res)
< {
< z--;
< omp_unset_lock (&lck);
< }
---
> omp_test_lock (&lck);
> z--;
> omp_unset_lock (&lck);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Farooqi, Muhammad Nufail wrote:
In my real code, I don't encounter this error for low number of processes. But when I run on like 1024 KNL nodes with 4 MPI processes in each node some of the processes faces the same error that i encounter in this sample code.
If the real code is correct it can't be "the same error" as one got with incorrect code. So, the problem seems to be different.
Probably, there are no enough resources. Do you have stack size large enough, for example?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Viet,
Your concern was answered by Farooqi's post #7 (unset issued inside true condition of set).
Farooqi should have presented a correct sketch code in post #1 as his reproducer.
Perhaps Farooqi can construct and post a revised reproducer.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim,
The correct sketch code do not produce the error. But the original code do produce an error when executed with 1024 KNL nodes (4096 MPI processes). The trace back indicates problem with openmp test or unset subroutines.
The codes runs fine when executed with with less number of processes. The code also runs fine using any number of nodes if compiled with gnu or cray compilers.
Is there any way to debug and locate error inside libiomp?
I also wonder why the incorrect code at post # 1 do not produce an error with gnu and newer intel compiler(post#2).
A backtrace (trace starting at bottom and going upwards) from the original code is as follows:
8: /lib64/libc.so.6(+0x34950) [0x2aaaad865950] ?? ??:0 9: /opt/intel/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64/libiomp5.so(+0x7f71c) [0x2aaaad2f571c] ?? ??:0 10: /opt/intel/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64/libiomp5.so(__kmpc_test_lock+0x7c) [0x2aaaad2d42dc] ?? ??:0 11: /global/u1/m/mfarooqi/castro_runs_fix/strong_scaling_knl64/1024_script/./Castro3d.knl.MPI.OMP.ex.1t.68t() [0x4116f1] Perilla::serviceLocalRequests(RegionGraph*, int) /global/u1/m/mfarooqi/amrex_perilla_castro_knl/perilla_cpp/Perilla.cpp:917 12: /global/u1/m/mfarooqi/castro_runs_fix/strong_scaling_knl64/1024_script/./Castro3d.knl.MPI.OMP.ex.1t.68t() [0x412099] Perilla::serviceMultipleGraphCommDynamic(std::vector<RegionGraph*, std::allocator<RegionGraph*> >, bool, int) /global/u1/m/mfarooqi/amrex_perilla_castro_knl/perilla_cpp/Perilla.cpp:1343 13: /global/u1/m/mfarooqi/castro_runs_fix/strong_scaling_knl64/1024_script/./Castro3d.knl.MPI.OMP.ex.1t.68t() [0x5cc6d6] amrex::Amr::coarseTimeStep(double) /global/u1/m/mfarooqi/amrex_perilla_castro_knl/amrex/Src/Amr/AMReX_Amr.cpp:2045 14: /opt/intel/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64/libiomp5.so(__kmp_invoke_microtask+0x93) [0x2aaaad32bac3] ?? ??:0 15: /opt/intel/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64/libiomp5.so(+0x84257) [0x2aaaad2fa257] ?? ??:0 16: /opt/intel/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64/libiomp5.so(+0x838d5) [0x2aaaad2f98d5] ?? ??:0 17: /opt/intel/compilers_and_libraries_2018.0.128/linux/compiler/lib/intel64/libiomp5.so(+0xb5fa4) [0x2aaaad32bfa4] ?? ??:0 18: /lib64/libpthread.so.0(+0x8744) [0x2aaaacd64744] ?? ??:0 19: /lib64/libc.so.6(clone+0x6d) [0x2aaaad91aaad] ?? ??:0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>But the original code do produce an error when executed with 1024 KNL nodes (4096 MPI processes).
The OpenMP omp_lock scheme is for intra-process (multi-threaded within single process) coding. It is not designed for inter-process (multi-process application regardless of threading via OpenMP or other thread model or lack thereof).
OpenMPI on the other hand has a different synchronization construct for use with Remote Memory Access (RMA). In MPI 3.1.pdf, section 11.5, you will find information about inter-process locking.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Additional information.
It is not unusual for an application, in particular an MPI application, to have multiple threads within a process. As a sub-set of this, it is also not unusual for the process to have a mix of non-OpenMP and OpenMP threads. You should be made aware that should you instantiate non-OpenMP threads in your process, .AND. if more than one of these threads (including main thread) enters an OpenMP parallel region, then each such occurrence instantiates a separate OpenMP thread pool. IOW each will have thread IDs in the range of 0:nThreads-1. If this be true, then a non-lock-owning thread with the same ID as in a different OpenMP thread pool that owns the lock, may inadvertently issue a non-conforming lock/unlock operation.
While the (not quite correct) code shown in post #1 does not illustrate this potential problem, your actual code may have this issue.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Farooqi, Muhammad Nufail wrote:
Is there any way to debug and locate error inside libiomp?
To debug OpenMP runtime you can take LLVM OpenMP from http://openmp.llvm.org/ and build it in debug mode. Then link it with your application and you will probably get more information from backtrace. Also you can use internal tracing feature in the debug OpenMP library. Tracing is enabled by setting KMP_{A,B,C,D,E,F}_DEBUG=<n> environment variable. If you wish to try it I can give you more info about tracing.
Farooqi, Muhammad Nufail wrote:
I also wonder why the incorrect code at post # 1 do not produce an error with gnu and newer intel compiler(post#2).
Do you mean that you see the crash with some old ICC and don't see with the newer version?
Which versions are they (old and new)? Actually it is better to provide OpenMP library version that is used.
To get this info please set KMP_VERSION=1 before running the app. And provide the output here. It would be something like the following
Intel(R) OMP Copyright (C) 1997-2017, Intel Corporation. All Rights Reserved.
Intel(R) OMP version: 5.0.20170829
Intel(R) OMP library type: performance
Intel(R) OMP link type: dynamic
Intel(R) OMP build time: 2017-09-09 15:52:48 UTC
Intel(R) OMP build compiler: Intel(R) C++ Compiler 16.0
Intel(R) OMP alternative compiler support: yes
Intel(R) OMP API version: 5.0 (201611)
Intel(R) OMP dynamic error checking: no
Intel(R) OMP thread affinity support: not used
Also, please set one more environment variable KMP_SETTINGS=1 and provide output here.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To debug OpenMP runtime you can take LLVM OpenMP from http://openmp.llvm.org/ and build it in debug mode. Then link it with your application and you will probably get more information from backtrace. Also you can use internal tracing feature in the debug OpenMP library. Tracing is enabled by setting KMP_{A,B,C,D,E,F}_DEBUG=<n> environment variable. If you wish to try it I can give you more info about tracing.
I tried to link llvm OpenMP but the compiler do no override the intel's default OpenMP library. Can you tell me the compiler option to use llvm OpenMP instead of the Intel's default libiomp?
Yes, I want to try tracing, A detailed info about it would be nice.
Do you mean that you see the crash with some old ICC and don't see with the newer version?
Which versions are they (old and new)? Actually it is better to provide OpenMP library version that is used.
I was referring the reply from Viet in post # 2. The code produce error with latest released intel compiler but not with the internal compiler (not released yet) test.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To use different libiomp at runtime set LD_LIBRARY_PATH=<path to the new libiomp> before running the app.
You can check that the correct libiomp is used by setting KMP_VERSION=1 (as I mentioned above).
So, for example, I see the following for my test:
1. Check the version of libiomp the app was initially linked
$ ldd ./test.exe
linux-vdso.so.1 => (0x00007fff67346000)
libm.so.6 => /lib64/libm.so.6 (0x00007fd964120000)
libiomp5.so => /opt/intel/compilers_and_libraries_2017.1.132/compilers_and_libraries_2017.4.196/linux/compiler/lib/intel64/libiomp5.so (0x00007fd963d7c000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fd963b66000)
(...)
$ KMP_VERSION=1 ./test.exe
Intel(R) OMP Copyright (C) 1997-2017, Intel Corporation. All Rights Reserved.
Intel(R) OMP version: 5.0.20170308
Intel(R) OMP library type: performance
Intel(R) OMP link type: dynamic
Intel(R) OMP build time: 2017-03-08 20:42:20 UTC
Intel(R) OMP build compiler: Intel C++ Compiler 15.0
Intel(R) OMP alternative compiler support: yes
Intel(R) OMP API version: 4.5 (201511)
Intel(R) OMP dynamic error checking: no
Intel(R) OMP thread affinity support: not used
- that is this is the libiomp from the Intel compiler 17.0.4
2. Use LD_LIBRARY_PATH
$ LD_LIBRARY_PATH=/home/LLVM-openmp/build.clang/runtime/src:$LD_LIBRARY_PATH KMP_VERSION=1 ldd ./test.exe
linux-vdso.so.1 => (0x00007fff775fe000)
libm.so.6 => /lib64/libm.so.6 (0x00007f4c46bd8000)
libiomp5.so => /home/LLVM-openmp/build.clang/runtime/src/libiomp5.so (0x00007f4c46926000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f4c46710000)
(...)
$ LD_LIBRARY_PATH=/home/LLVM-openmp/build.clang/runtime/src:$LD_LIBRARY_PATH KMP_VERSION=1 ./test.exe
Intel(R) OMP version: 5.0.20171019
Intel(R) OMP library type: performance
Intel(R) OMP link type: dynamic
Intel(R) OMP build time: no_timestamp
Intel(R) OMP build compiler: Clang 6.0
Intel(R) OMP alternative compiler support: yes
Intel(R) OMP API version: 5.0 (201611)
Intel(R) OMP dynamic error checking: no
Intel(R) OMP thread affinity support: not used
- this is the library built with clang from llvm openmp.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Farooqi, Muhammad Nufail wrote:
I was referring the reply from Viet in post # 2. The code produce error with latest released intel compiler but not with the internal compiler (not released yet) test.
That reply was sent before it was recognized your reproducer code was incorrect. So, the statement in the reply is not correct until you can verify and confirm that your application crashes with one ICC version and works well with another.
For debugging purpose it would be still better to have a small reproducer that can be run on one node.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My problem in the original code is solved by using the latest OpenMP library from LLVM.
I hope the issue will be fixed in the Intel compiler's next update as well.
Thank you all for your suggestions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Farooqi, Muhammad Nufail wrote:
My problem in the original code is solved by using the latest OpenMP library from LLVM.
I hope the issue will be fixed in the Intel compiler's next update as well.
Thank you all for your suggestions.
Thanks for checking it with LLVM OpenMP!
This probably relates to the issue https://bugs.llvm.org/show_bug.cgi?id=34050 that was partially fixed with https://reviews.llvm.org/rL317115
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page