Intel® oneAPI Base Toolkit
Support for the core tools and libraries within the base toolkit that are used to build and deploy high-performance data-centric applications.
414 Discussions

seg fault in oneapi 2023.0.0 libiomp5.so

may_ka
Beginner
542 Views

Hi.

 

I am experiencing a segfault when using MKL through the 2023.0.0 oneapi, which seem to boil down to libiomp5.so which ships with 2023.0.0.

 

I can use libiomp5.so from 2022.2.1 and libgomp.so without any issues.

 

Operation system is linux.

 

The call stack using gdb is:

 

Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x1555411ac740 (LWP 92944)]
[New Thread 0x155540daa7c0 (LWP 92945)]
[New Thread 0x1555409a8840 (LWP 92946)]
[New Thread 0x1555405a68c0 (LWP 92947)]
[New Thread 0x1555401a4940 (LWP 92948)]
[New Thread 0x15553fda29c0 (LWP 92949)]
[New Thread 0x15553f9a0a40 (LWP 92950)]

Thread 1 "program" received signal SIGSEGV, Segmentation fault.
0x00001555550fc3cb in std::__atomic_base<int>::fetch_add (this=<optimized out>, __i=<optimized out>, __m=<optimized out>) at /usr/include/c++/4.8.2/bits/atomic_base.h:614
614	/usr/include/c++/4.8.2/bits/atomic_base.h: Directory not empty.
#1  _INTERNAL99002fb4::__kmp_node_ref (node=<optimized out>) at ../../src/kmp_taskdeps.cpp:80
80	../../src/kmp_taskdeps.cpp: Directory not empty.
#2  _INTERNAL99002fb4::__kmp_dephash_find (thread=<optimized out>, hash=<optimized out>, addr=<optimized out>)
    at ../../src/kmp_taskdeps.cpp:210
210	in ../../src/kmp_taskdeps.cpp
#3  _INTERNAL99002fb4::__kmp_process_deps<true> (gtid=1, node=0x5555ce36c050, hash=0x4f8, dep_barrier=128, 
    ndeps=104880, dep_list=0xc, task=0x5555d866f640) at ../../src/kmp_taskdeps.cpp:401
401	in ../../src/kmp_taskdeps.cpp
#4  0x00001555550fdd1f in __kmpc_omp_task_with_deps (loc_ref=0x1, gtid=-835272624, new_task=0x4f8, ndeps=-834433152, 
    dep_list=0x199b0, ndeps_noalias=12, noalias_dep_list=0x0) at ../../src/kmp_taskdeps.cpp:566
566	in ../../src/kmp_taskdeps.cpp
#5  0x0000555556ff3e6b in mkl_lapack_dtrtri ()
#6  0x0000155555160053 in __kmp_invoke_microtask ()
   from /opt/intel/oneapi/compiler/2023.0.0/linux/compiler/lib/intel64_lin/libiomp5.so
#7  0x00001555550ce2f3 in __kmp_invoke_task_func (gtid=1) at ../../src/kmp_runtime.cpp:7845
7845	../../src/kmp_runtime.cpp: Directory not empty.
#8  0x00001555550cf578 in __kmp_fork_call (loc=0x1, gtid=-835272624, call_context=(unknown: 0x4f8), argc=-834433152, 
    microtask=0x199b0, invoker=0xc, ap=0x7fffffffae10) at ../../src/kmp_runtime.cpp:2508
2508	in ../../src/kmp_runtime.cpp
#9  0x0000155555088223 in __kmpc_fork_call (loc=0x1, argc=-835272624, microtask=0x4f8)
    at ../../src/kmp_csupport.cpp:350
350	../../src/kmp_csupport.cpp: Directory not empty.
#10 0x0000555556ff2d5b in mkl_lapack_dtrtri ()
#11 0x00005555560a8f3d in mkl_lapack.dtrtri_64_ ()
#12 0x00005555560a4c50 in LAPACKE_dtrtri_work ()
#13 0x00005555558cab7b in arr_d_2_sq_sy_tri<double>::inverse (this=this@entry=0x7fffffffb4d8, 
    uplo=uplo@entry=matmul_uplo::upper, diag=diag@entry=matmul_diag::no_unit)
    at /home/user/baselib/src/array_l_d_2_sq_sy.cpp:203
203	      stat=intel_mkl::LAPACKE_dtrtri(LAPACK_COL_MAJOR,uplo_l,diag_l,n,this->data(),lda);

 

The compiler line is:

iclang++ -march=native -std=c++20 -fPIE -std=gnu++20 -ferror-limit=4 -D_GLIBCXX_DEBUG_PEDANTIC -g -O3   -c src/main.cpp -o main.o -I /opt/intel/oneapi/mkl/2023.0.0/include

where "iclang++" is a softlink the Intel clang++

 

The link line is:

g++ -static-libstdc++ -L /opt/intel/oneapi/compiler/2023.0.0/linux/compiler/lib/intel64_lin -o program main.o -Wl,-Bstatic -Wl,--start-group /opt/intel/oneapi/mkl/2023.0.0/lib/intel64/libmkl_intel_ilp64.a /opt/intel/oneapi/mkl/2023.0.0/lib/intel64/libmkl_core.a /opt/intel/oneapi/mkl/2023.0.0/lib/intel64/libmkl_intel_thread.a -Wl,--end-group -l stdc++ -Wl,-Bdynamic -liomp5 -lpthread -lm -ldl

 

The environment settings are

 

OMP_PLACES=cores
OMP_PROC_BIND=true
OMP_DYNAMIC=FALSE
OMP_MAX_ACTIVE_LEVELS=2147483647
OMP_NUM_THREADS=8

The output of ldd is:

 

linux-vdso.so.1 (0x00007ffeb2745000)
        libiomp5.so => /opt/intel/oneapi/compiler/2023.0.0/linux/compiler/lib/intel64_lin/libiomp5.so (0x0000148da9800000)
        libm.so.6 => /usr/lib/libm.so.6 (0x0000148db3729000)
        libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x0000148db3709000)
        libc.so.6 => /usr/lib/libc.so.6 (0x0000148da9619000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x0000148db3860000)
        librt.so.1 => /usr/lib/librt.so.1 (0x0000148db3702000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0x0000148db36fd000)
        libdl.so.2 => /usr/lib/libdl.so.2 (0x0000148db36f8000)

 

The libc version is:

 

ldd --version
ldd (GNU libc) 2.37
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

 

From the gdb output it looks as if the problem is deep inside libiomp5/kmp. Since the application is not threaded it can only occur inside oneapi called from oneapi MKL. Further, when liking the sequential oneapi MKL the problem disappears.

 

The application where this occurs is fairly complex.  So far I did not manage to build a small reproducer,  the problem simply didn't show up.

 

Any idea much appreciated.

0 Kudos
3 Replies
ShanmukhS_Intel
Moderator
495 Views

Hi Karl,

 

Thanks for posting in Intel Communities.

 

We would like to recommend you to cross-verify if the linking of your application is done as expected according to the format of integers that were been used by you as you are linking using ILP64 libraries. You could refer to the below link for more details regarding the usage of link line advisor.

 

https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html

 

In addition, we would like to request you to kindly provide a sample reproducer, as it helps reproduce the issue in our environment and assist you accordingly.

 

Best Regards,

Shanmukh.SS

 

 

0 Kudos
ShanmukhS_Intel
Moderator
471 Views

Hi Karl,

 

We see that you are linking using ILP64 libraries. Could you please confirm if all the input parameters which were been used in your code correspond to 64-bit integers? It helps us in understanding if there is any problem in the code part.

 

It is needed to choose the LP64 interface or ILP64 interface. The difference between them is integer type length i.e. ILP64 interface uses the 64-bit integer type, while LP64 uses the 32-bit integer type.

 

Best Regards,

Shanmukh.SS

 

0 Kudos
ShanmukhS_Intel
Moderator
416 Views

Hi Karl,


We assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Best Regards,

Shanmukh.SS


0 Kudos
Reply