Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2159 Discussions

UCX segmentation fault in libmkl_avx2.so.1

bxmbxm
Beginner
2,266 Views

Hello I have successfully compiled my application but sometimes it crashed and produced

Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))

==== backtrace (tid: 40561) ====
0 /usr/lib64/libucs.so.0(ucs_handle_error+0x104) [0x7fa02aade4d4]
1 /usr/lib64/libucs.so.0(+0x1e8fc) [0x7fa02aade8fc]
2 /usr/lib64/libucs.so.0(+0x1eab2) [0x7fa02aadeab2]
3 /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_avx2.so.1(mkl_blas_avx2_xzdotc+0x2a3) [0x7fa02780d9a3]
=================================
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
prep-mpi.x 0000000000A997CA Unknown Unknown Unknown
libpthread-2.26.s 00007FA02D8F12D0 Unknown Unknown Unknown
libmkl_avx2.so.1 00007FA02780D9A3 mkl_blas_avx2_xzd Unknown Unknown

I have MLNX_OFED_LINUX-5.0-2.1.8.0 (due to FDR cards) and intelone api

c 2021.2.0.118

fortran 2021.2.0.136

mpi 2021.2.0.215

mkl 2021.2.0.296

ucx_info -v
# UCT version=1.8.0 revision c0a9704
# configured with: --host=x86_64-suse-linux-gnu --build=x86_64-suse-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/lib --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-dependency-tracking --disable-optimizations --disable-logging --disable-debug --disable-assertions --enable-mt --disable-params-check --enable-cma --without-cuda --without-gdrcopy --with-verbs --without-cm --with-knem --with-rdmacm --without-rocm --without-xpmem --without-ugni --without-java --disable-numa

Where could be problem?

0 Kudos
8 Replies
ShivaniK_Intel
Moderator
2,236 Views

Hi,


Thanks for reaching out to us.


Could you please provide the hardware details of the system on which you are running the application?


If possible, could you also provide the exact link to the application?


Thanks & Regards

Shivani


0 Kudos
bxmbxm
Beginner
2,110 Views

Thanks for reply

83:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3] MCX353A-FCBT

Supermicro X10DAI 2xIntel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz 256GB

Till now it failed in two

private https://people.sissa.it/~sorella/TurboRVB_Manual/build/html/index.html

and cp2k

I have also tried newer version of UCX

# UCT version=1.11.0 revision 6031c98
# configured with: --host=x86_64-suse-linux-gnu --build=x86_64-suse-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/lib --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-dependency-tracking --disable-optimizations --disable-logging --disable-debug --disable-assertions --enable-mt --disable-params-check --without-java --enable-cma --without-cuda --without-gdrcopy --with-verbs --with-knem --with-rdmacm --without-rocm --without-xpmem --without-fuse3 --without-ugni --disable-numa

new ucx_info -d

#
# Memory domain: posix
# Component: posix
# allocate: unlimited
# remote key: 24 bytes
# rkey_ptr is supported
#
# Transport: posix
# Device: memory
# System device: <unknown>
#
# capabilities:
# bandwidth: 0.00/ppn + 12179.00 MB/sec
# latency: 80 nsec
# overhead: 10 nsec
# put_short: <= 4294967295
# put_bcopy: unlimited
# get_bcopy: unlimited
# am_short: <= 100
# am_bcopy: <= 8256
# domain: cpu
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 8 bytes
# iface address: 8 bytes
# error handling: ep_check
#
#
# Memory domain: sysv
# Component: sysv
# allocate: unlimited
# remote key: 12 bytes
# rkey_ptr is supported
#
# Transport: sysv
# Device: memory
# System device: <unknown>
#
# capabilities:
# bandwidth: 0.00/ppn + 12179.00 MB/sec
# latency: 80 nsec
# overhead: 10 nsec
# put_short: <= 4294967295
# put_bcopy: unlimited
# get_bcopy: unlimited
# am_short: <= 100
# am_bcopy: <= 8256
# domain: cpu
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 8 bytes
# iface address: 8 bytes
# error handling: ep_check
#
#
# Memory domain: self
# Component: self
# register: unlimited, cost: 0 nsec
# remote key: 0 bytes
#
# Transport: self
# Device: memory0
# System device: <unknown>
#
# capabilities:
# bandwidth: 0.00/ppn + 6911.00 MB/sec
# latency: 0 nsec
# overhead: 10 nsec
# put_short: <= 4294967295
# put_bcopy: unlimited
# get_bcopy: unlimited
# am_short: <= 8K
# am_bcopy: <= 8K
# domain: cpu
# atomic_add: 32, 64 bit
# atomic_and: 32, 64 bit
# atomic_or: 32, 64 bit
# atomic_xor: 32, 64 bit
# atomic_fadd: 32, 64 bit
# atomic_fand: 32, 64 bit
# atomic_for: 32, 64 bit
# atomic_fxor: 32, 64 bit
# atomic_swap: 32, 64 bit
# atomic_cswap: 32, 64 bit
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 0 bytes
# iface address: 8 bytes
# error handling: ep_check
#
#
# Memory domain: tcp
# Component: tcp
# register: unlimited, cost: 0 nsec
# remote key: 0 bytes
#
# Transport: tcp
# Device: eth1
# System device: <unknown>
#
# capabilities:
# bandwidth: 113.16/ppn + 0.00 MB/sec
# latency: 5776 nsec
# overhead: 50000 nsec
# put_zcopy: <= 18446744073709551590, up to 6 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 0
# am_short: <= 8K
# am_bcopy: <= 8K
# am_zcopy: <= 64K, up to 6 iov
# am_opt_zcopy_align: <= 1
# am_align_mtu: <= 0
# am header: <= 8037
# connection: to ep, to iface
# device priority: 0
# device num paths: 1
# max eps: 256
# device address: 6 bytes
# iface address: 2 bytes
# ep address: 10 bytes
# error handling: peer failure, ep_check, keepalive
#
# Transport: tcp
# Device: ib0
# System device: <unknown>
#
# capabilities:
# bandwidth: 6239.81/ppn + 0.00 MB/sec
# latency: 5210 nsec
# overhead: 50000 nsec
# put_zcopy: <= 18446744073709551590, up to 6 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 0
# am_short: <= 8K
# am_bcopy: <= 8K
# am_zcopy: <= 64K, up to 6 iov
# am_opt_zcopy_align: <= 1
# am_align_mtu: <= 0
# am header: <= 8037
# connection: to ep, to iface
# device priority: 1
# device num paths: 1
# max eps: 256
# device address: 6 bytes
# iface address: 2 bytes
# ep address: 10 bytes
# error handling: peer failure, ep_check, keepalive
#
# Transport: tcp
# Device: lo
# System device: <unknown>
#
# capabilities:
# bandwidth: 11.91/ppn + 0.00 MB/sec
# latency: 10960 nsec
# overhead: 50000 nsec
# put_zcopy: <= 18446744073709551590, up to 6 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 0
# am_short: <= 8K
# am_bcopy: <= 8K
# am_zcopy: <= 64K, up to 6 iov
# am_opt_zcopy_align: <= 1
# am_align_mtu: <= 0
# am header: <= 8037
# connection: to ep, to iface
# device priority: 1
# device num paths: 1
# max eps: 256
# device address: 18 bytes
# iface address: 2 bytes
# ep address: 10 bytes
# error handling: peer failure, ep_check, keepalive
#
#
# Connection manager: tcp
# max_conn_priv: 2064 bytes
#
# Memory domain: mlx4_0
# Component: ib
# register: unlimited, cost: 180 nsec
# remote key: 8 bytes
# local memory handle is required for zcopy
#
# Transport: rc_verbs
# Device: mlx4_0:1
# System device: 0000:83:00.0 (0)
#
# capabilities:
# bandwidth: 6433.22/ppn + 0.00 MB/sec
# latency: 900 + 1.000 * N nsec
# overhead: 75 nsec
# put_short: <= 88
# put_bcopy: <= 8256
# put_zcopy: <= 1G, up to 6 iov
# put_opt_zcopy_align: <= 512
# put_align_mtu: <= 2K
# get_bcopy: <= 8256
# get_zcopy: 65..1G, up to 6 iov
# get_opt_zcopy_align: <= 512
# get_align_mtu: <= 2K
# am_short: <= 87
# am_bcopy: <= 8255
# am_zcopy: <= 8255, up to 5 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 2K
# am header: <= 127
# domain: device
# atomic_add: 64 bit
# atomic_fadd: 64 bit
# atomic_cswap: 64 bit
# connection: to ep
# device priority: 10
# device num paths: 1
# max eps: 256
# device address: 4 bytes
# ep address: 4 bytes
# error handling: peer failure, ep_check
#
#
# Transport: ud_verbs
# Device: mlx4_0:1
# System device: 0000:83:00.0 (0)
#
# capabilities:
# bandwidth: 6433.22/ppn + 0.00 MB/sec
# latency: 930 nsec
# overhead: 105 nsec
# am_short: <= 172
# am_bcopy: <= 4088
# am_zcopy: <= 4088, up to 8 iov
# am_opt_zcopy_align: <= 512
# am_align_mtu: <= 4K
# am header: <= 3952
# connection: to ep, to iface
# device priority: 10
# device num paths: 1
# max eps: inf
# device address: 4 bytes
# iface address: 3 bytes
# ep address: 6 bytes
# error handling: peer failure, ep_check
#
#
# Connection manager: rdmacm
# max_conn_priv: 54 bytes
#
# Memory domain: cma
# Component: cma
# register: unlimited, cost: 9 nsec
#
# Transport: cma
# Device: memory
# System device: <unknown>
#
# capabilities:
# bandwidth: 0.00/ppn + 11145.00 MB/sec
# latency: 80 nsec
# overhead: 400 nsec
# put_zcopy: unlimited, up to 16 iov
# put_opt_zcopy_align: <= 1
# put_align_mtu: <= 1
# get_zcopy: unlimited, up to 16 iov
# get_opt_zcopy_align: <= 1
# get_align_mtu: <= 1
# connection: to iface
# device priority: 0
# device num paths: 1
# max eps: inf
# device address: 8 bytes
# iface address: 4 bytes
# error handling: peer failure, ep_check
#

but obtained same segmentation fault

0 Kudos
bxmbxm
Beginner
2,196 Views

I'm unable to reply. I have sent it 10 times and still nothing here.

0 Kudos
bxmbxm
Beginner
2,111 Views

83:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3] mcx353a-fcbt

X10DAI 2xIntel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz  256GB

codes: https://people.sissa.it/~sorella/TurboRVB_Manual/build/html/index.html (private)

cp2k https://www.cp2k.org

I have also tried newer  version of ucx but same crash and try also MKL_VERBOSE=1 but I'm unable figure out where could be problem

ucx_info -v

[1625600433.356047] [localhost:47742:0] debug.c:1199 UCX DEBUG using signal stack 0x7f1572f67000 size 141824
[1625600433.379989] [localhost:47742:0] init.c:114 UCX DEBUG /home/bxm/Downloads/ucx-rpms/install/lib64/libucs.so.0 loaded at 0x7f1572670000
[1625600433.380038] [localhost:47742:0] init.c:115 UCX DEBUG cmd line: ucx_info -v
[1625600433.380059] [localhost:47742:0] module.c:69 UCX DEBUG ucs library path: /home/bxm/Downloads/ucx-rpms/install/lib64/libucs.so.0
[1625600433.380072] [localhost:47742:0] module.c:251 UCX DEBUG loading modules for ucs
# UCT version=1.11.0 revision 6031c98
# configured with: --host=x86_64-suse-linux-gnu --build=x86_64-suse-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/lib --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-dependency-tracking --disable-optimizations --disable-logging --disable-debug --disable-assertions --enable-mt --disable-params-check --without-java --enable-cma --without-cuda --without-gdrcopy --with-verbs --with-knem --with-rdmacm --without-rocm --without-xpmem --without-fuse3 --without-ugni --disable-numa

0 Kudos
bxmbxm
Beginner
2,185 Views

83:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3] mcx353a-fcbt

X10DAI 2xIntel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz  256GB

codes: https://people.sissa.it/~sorella/TurboRVB_Manual/build/html/index.html (private)

cp2k https://www.cp2k.org

I have also tried newer  version of ucx UCT version=1.11.0 revision 6031c98 but same crash and try also MKL_VERBOSE=1 but I'm unable figure out where could be problem

0 Kudos
ShivaniK_Intel
Moderator
2,150 Views


Hi,


Please provide answers to the below questions.


1.Could you please provide us the command line you have been using?


2.Could you also provide the details of the below commands?


   which mpirun

   ldd <executable file>


3.Could you please set I_MPI_DEBUG=10 and provide a complete error log?


4.Are you able to run benchmarks/other applications in this environment or facing the same issues?


5.Do you have exclusive access to all the nodes?


Thanks & Regards

Shivani


0 Kudos
bxmbxm
Beginner
2,145 Views

Thanks for reply

I think I have found source of problems. I think I was to fast posting such a problem here. First I think it is not UCX related bug as I've written. Backtrace only show that ucx library is loaded and it must be because libmlx-fi.so as part of libfabric.so loaded it for my mellanox card.

In cp2k I have found that after increasing stack size sigsegv disapeared and code work well.

For turborvb after compiling with debug features enabled and without optimalizations I was able to do some test calculations without such sigsegv but some checks return infinite numbers so It looks like bug in code, like bad pointer or alignment of memory in parameters for mkl calling. So I think we should close this thread as solved.

0 Kudos
ShivaniK_Intel
Moderator
2,115 Views

Hi,

 

Glad to know that your issue is resolved. If you need any additional information, please post a new question as this thread is no longer be monitored by Intel

 

Thanks & Regards

Shivani

 

0 Kudos
Reply