<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: UCX segmentation fault in libmkl_avx2.so.1 in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/UCX-segmentation-fault-in-libmkl-avx2-so-1/m-p/1296376#M8550</link>
    <description>&lt;P&gt;I'm unable to reply. I have sent it 10 times and still nothing here.&lt;/P&gt;</description>
    <pubDate>Tue, 06 Jul 2021 19:35:45 GMT</pubDate>
    <dc:creator>bxmbxm</dc:creator>
    <dc:date>2021-07-06T19:35:45Z</dc:date>
    <item>
      <title>UCX segmentation fault in libmkl_avx2.so.1</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/UCX-segmentation-fault-in-libmkl-avx2-so-1/m-p/1295427#M8533</link>
      <description>&lt;P&gt;Hello I have successfully compiled my application but sometimes it crashed and produced&lt;/P&gt;
&lt;P&gt;Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))&lt;/P&gt;
&lt;P&gt;==== backtrace (tid: 40561) ====&lt;BR /&gt;0 /usr/lib64/libucs.so.0(ucs_handle_error+0x104) [0x7fa02aade4d4]&lt;BR /&gt;1 /usr/lib64/libucs.so.0(+0x1e8fc) [0x7fa02aade8fc]&lt;BR /&gt;2 /usr/lib64/libucs.so.0(+0x1eab2) [0x7fa02aadeab2]&lt;BR /&gt;3 /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_avx2.so.1(mkl_blas_avx2_xzdotc+0x2a3) [0x7fa02780d9a3]&lt;BR /&gt;=================================&lt;BR /&gt;forrtl: severe (174): SIGSEGV, segmentation fault occurred&lt;BR /&gt;Image PC Routine Line Source &lt;BR /&gt;prep-mpi.x 0000000000A997CA Unknown Unknown Unknown&lt;BR /&gt;libpthread-2.26.s 00007FA02D8F12D0 Unknown Unknown Unknown&lt;BR /&gt;libmkl_avx2.so.1 00007FA02780D9A3 mkl_blas_avx2_xzd Unknown Unknown&lt;/P&gt;
&lt;P&gt;I have MLNX_OFED_LINUX-5.0-2.1.8.0 (due to FDR cards) and intelone api&lt;/P&gt;
&lt;P&gt;c 2021.2.0.118&lt;/P&gt;
&lt;P&gt;fortran 2021.2.0.136&lt;/P&gt;
&lt;P&gt;mpi 2021.2.0.215&lt;/P&gt;
&lt;P&gt;mkl 2021.2.0.296&lt;/P&gt;
&lt;P&gt;ucx_info -v&lt;BR /&gt;# UCT version=1.8.0 revision c0a9704&lt;BR /&gt;# configured with: --host=x86_64-suse-linux-gnu --build=x86_64-suse-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/lib --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-dependency-tracking --disable-optimizations --disable-logging --disable-debug --disable-assertions --enable-mt --disable-params-check --enable-cma --without-cuda --without-gdrcopy --with-verbs --without-cm --with-knem --with-rdmacm --without-rocm --without-xpmem --without-ugni --without-java --disable-numa&lt;/P&gt;
&lt;P&gt;Where could be problem?&lt;/P&gt;</description>
      <pubDate>Fri, 02 Jul 2021 13:00:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/UCX-segmentation-fault-in-libmkl-avx2-so-1/m-p/1295427#M8533</guid>
      <dc:creator>bxmbxm</dc:creator>
      <dc:date>2021-07-02T13:00:14Z</dc:date>
    </item>
    <item>
      <title>Re:UCX segmentation fault in libmkl_avx2.so.1</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/UCX-segmentation-fault-in-libmkl-avx2-so-1/m-p/1295986#M8539</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for reaching out to us.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Could you please provide the hardware details of the system on which you are running the application?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;If possible, could you also provide the exact link to the application?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks &amp;amp; Regards&lt;/P&gt;&lt;P&gt;Shivani&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 05 Jul 2021 12:12:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/UCX-segmentation-fault-in-libmkl-avx2-so-1/m-p/1295986#M8539</guid>
      <dc:creator>ShivaniK_Intel</dc:creator>
      <dc:date>2021-07-05T12:12:09Z</dc:date>
    </item>
    <item>
      <title>Re: Re:UCX segmentation fault in libmkl_avx2.so.1</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/UCX-segmentation-fault-in-libmkl-avx2-so-1/m-p/1296251#M8545</link>
      <description>&lt;P&gt;Thanks for reply&lt;/P&gt;
&lt;P&gt;83:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3] MCX353A-FCBT&lt;/P&gt;
&lt;P&gt;Supermicro X10DAI 2xIntel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz 256GB&lt;/P&gt;
&lt;P&gt;Till now it failed in two&lt;/P&gt;
&lt;P&gt;private &lt;A href="https://people.sissa.it/~sorella/TurboRVB_Manual/build/html/index.html" target="_blank"&gt;https://people.sissa.it/~sorella/TurboRVB_Manual/build/html/index.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;and cp2k&lt;/P&gt;
&lt;P&gt;I have also tried newer version of UCX&lt;/P&gt;
&lt;P&gt;# UCT version=1.11.0 revision 6031c98&lt;BR /&gt;# configured with: --host=x86_64-suse-linux-gnu --build=x86_64-suse-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/lib --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-dependency-tracking --disable-optimizations --disable-logging --disable-debug --disable-assertions --enable-mt --disable-params-check --without-java --enable-cma --without-cuda --without-gdrcopy --with-verbs --with-knem --with-rdmacm --without-rocm --without-xpmem --without-fuse3 --without-ugni --disable-numa&lt;/P&gt;
&lt;P&gt;new ucx_info -d&lt;/P&gt;
&lt;P&gt;#&lt;BR /&gt;# Memory domain: posix&lt;BR /&gt;# Component: posix&lt;BR /&gt;# allocate: unlimited&lt;BR /&gt;# remote key: 24 bytes&lt;BR /&gt;# rkey_ptr is supported&lt;BR /&gt;#&lt;BR /&gt;# Transport: posix&lt;BR /&gt;# Device: memory&lt;BR /&gt;# System device: &amp;lt;unknown&amp;gt;&lt;BR /&gt;#&lt;BR /&gt;# capabilities:&lt;BR /&gt;# bandwidth: 0.00/ppn + 12179.00 MB/sec&lt;BR /&gt;# latency: 80 nsec&lt;BR /&gt;# overhead: 10 nsec&lt;BR /&gt;# put_short: &amp;lt;= 4294967295&lt;BR /&gt;# put_bcopy: unlimited&lt;BR /&gt;# get_bcopy: unlimited&lt;BR /&gt;# am_short: &amp;lt;= 100&lt;BR /&gt;# am_bcopy: &amp;lt;= 8256&lt;BR /&gt;# domain: cpu&lt;BR /&gt;# atomic_add: 32, 64 bit&lt;BR /&gt;# atomic_and: 32, 64 bit&lt;BR /&gt;# atomic_or: 32, 64 bit&lt;BR /&gt;# atomic_xor: 32, 64 bit&lt;BR /&gt;# atomic_fadd: 32, 64 bit&lt;BR /&gt;# atomic_fand: 32, 64 bit&lt;BR /&gt;# atomic_for: 32, 64 bit&lt;BR /&gt;# atomic_fxor: 32, 64 bit&lt;BR /&gt;# atomic_swap: 32, 64 bit&lt;BR /&gt;# atomic_cswap: 32, 64 bit&lt;BR /&gt;# connection: to iface&lt;BR /&gt;# device priority: 0&lt;BR /&gt;# device num paths: 1&lt;BR /&gt;# max eps: inf&lt;BR /&gt;# device address: 8 bytes&lt;BR /&gt;# iface address: 8 bytes&lt;BR /&gt;# error handling: ep_check&lt;BR /&gt;#&lt;BR /&gt;#&lt;BR /&gt;# Memory domain: sysv&lt;BR /&gt;# Component: sysv&lt;BR /&gt;# allocate: unlimited&lt;BR /&gt;# remote key: 12 bytes&lt;BR /&gt;# rkey_ptr is supported&lt;BR /&gt;#&lt;BR /&gt;# Transport: sysv&lt;BR /&gt;# Device: memory&lt;BR /&gt;# System device: &amp;lt;unknown&amp;gt;&lt;BR /&gt;#&lt;BR /&gt;# capabilities:&lt;BR /&gt;# bandwidth: 0.00/ppn + 12179.00 MB/sec&lt;BR /&gt;# latency: 80 nsec&lt;BR /&gt;# overhead: 10 nsec&lt;BR /&gt;# put_short: &amp;lt;= 4294967295&lt;BR /&gt;# put_bcopy: unlimited&lt;BR /&gt;# get_bcopy: unlimited&lt;BR /&gt;# am_short: &amp;lt;= 100&lt;BR /&gt;# am_bcopy: &amp;lt;= 8256&lt;BR /&gt;# domain: cpu&lt;BR /&gt;# atomic_add: 32, 64 bit&lt;BR /&gt;# atomic_and: 32, 64 bit&lt;BR /&gt;# atomic_or: 32, 64 bit&lt;BR /&gt;# atomic_xor: 32, 64 bit&lt;BR /&gt;# atomic_fadd: 32, 64 bit&lt;BR /&gt;# atomic_fand: 32, 64 bit&lt;BR /&gt;# atomic_for: 32, 64 bit&lt;BR /&gt;# atomic_fxor: 32, 64 bit&lt;BR /&gt;# atomic_swap: 32, 64 bit&lt;BR /&gt;# atomic_cswap: 32, 64 bit&lt;BR /&gt;# connection: to iface&lt;BR /&gt;# device priority: 0&lt;BR /&gt;# device num paths: 1&lt;BR /&gt;# max eps: inf&lt;BR /&gt;# device address: 8 bytes&lt;BR /&gt;# iface address: 8 bytes&lt;BR /&gt;# error handling: ep_check&lt;BR /&gt;#&lt;BR /&gt;#&lt;BR /&gt;# Memory domain: self&lt;BR /&gt;# Component: self&lt;BR /&gt;# register: unlimited, cost: 0 nsec&lt;BR /&gt;# remote key: 0 bytes&lt;BR /&gt;#&lt;BR /&gt;# Transport: self&lt;BR /&gt;# Device: memory0&lt;BR /&gt;# System device: &amp;lt;unknown&amp;gt;&lt;BR /&gt;#&lt;BR /&gt;# capabilities:&lt;BR /&gt;# bandwidth: 0.00/ppn + 6911.00 MB/sec&lt;BR /&gt;# latency: 0 nsec&lt;BR /&gt;# overhead: 10 nsec&lt;BR /&gt;# put_short: &amp;lt;= 4294967295&lt;BR /&gt;# put_bcopy: unlimited&lt;BR /&gt;# get_bcopy: unlimited&lt;BR /&gt;# am_short: &amp;lt;= 8K&lt;BR /&gt;# am_bcopy: &amp;lt;= 8K&lt;BR /&gt;# domain: cpu&lt;BR /&gt;# atomic_add: 32, 64 bit&lt;BR /&gt;# atomic_and: 32, 64 bit&lt;BR /&gt;# atomic_or: 32, 64 bit&lt;BR /&gt;# atomic_xor: 32, 64 bit&lt;BR /&gt;# atomic_fadd: 32, 64 bit&lt;BR /&gt;# atomic_fand: 32, 64 bit&lt;BR /&gt;# atomic_for: 32, 64 bit&lt;BR /&gt;# atomic_fxor: 32, 64 bit&lt;BR /&gt;# atomic_swap: 32, 64 bit&lt;BR /&gt;# atomic_cswap: 32, 64 bit&lt;BR /&gt;# connection: to iface&lt;BR /&gt;# device priority: 0&lt;BR /&gt;# device num paths: 1&lt;BR /&gt;# max eps: inf&lt;BR /&gt;# device address: 0 bytes&lt;BR /&gt;# iface address: 8 bytes&lt;BR /&gt;# error handling: ep_check&lt;BR /&gt;#&lt;BR /&gt;#&lt;BR /&gt;# Memory domain: tcp&lt;BR /&gt;# Component: tcp&lt;BR /&gt;# register: unlimited, cost: 0 nsec&lt;BR /&gt;# remote key: 0 bytes&lt;BR /&gt;#&lt;BR /&gt;# Transport: tcp&lt;BR /&gt;# Device: eth1&lt;BR /&gt;# System device: &amp;lt;unknown&amp;gt;&lt;BR /&gt;#&lt;BR /&gt;# capabilities:&lt;BR /&gt;# bandwidth: 113.16/ppn + 0.00 MB/sec&lt;BR /&gt;# latency: 5776 nsec&lt;BR /&gt;# overhead: 50000 nsec&lt;BR /&gt;# put_zcopy: &amp;lt;= 18446744073709551590, up to 6 iov&lt;BR /&gt;# put_opt_zcopy_align: &amp;lt;= 1&lt;BR /&gt;# put_align_mtu: &amp;lt;= 0&lt;BR /&gt;# am_short: &amp;lt;= 8K&lt;BR /&gt;# am_bcopy: &amp;lt;= 8K&lt;BR /&gt;# am_zcopy: &amp;lt;= 64K, up to 6 iov&lt;BR /&gt;# am_opt_zcopy_align: &amp;lt;= 1&lt;BR /&gt;# am_align_mtu: &amp;lt;= 0&lt;BR /&gt;# am header: &amp;lt;= 8037&lt;BR /&gt;# connection: to ep, to iface&lt;BR /&gt;# device priority: 0&lt;BR /&gt;# device num paths: 1&lt;BR /&gt;# max eps: 256&lt;BR /&gt;# device address: 6 bytes&lt;BR /&gt;# iface address: 2 bytes&lt;BR /&gt;# ep address: 10 bytes&lt;BR /&gt;# error handling: peer failure, ep_check, keepalive&lt;BR /&gt;#&lt;BR /&gt;# Transport: tcp&lt;BR /&gt;# Device: ib0&lt;BR /&gt;# System device: &amp;lt;unknown&amp;gt;&lt;BR /&gt;#&lt;BR /&gt;# capabilities:&lt;BR /&gt;# bandwidth: 6239.81/ppn + 0.00 MB/sec&lt;BR /&gt;# latency: 5210 nsec&lt;BR /&gt;# overhead: 50000 nsec&lt;BR /&gt;# put_zcopy: &amp;lt;= 18446744073709551590, up to 6 iov&lt;BR /&gt;# put_opt_zcopy_align: &amp;lt;= 1&lt;BR /&gt;# put_align_mtu: &amp;lt;= 0&lt;BR /&gt;# am_short: &amp;lt;= 8K&lt;BR /&gt;# am_bcopy: &amp;lt;= 8K&lt;BR /&gt;# am_zcopy: &amp;lt;= 64K, up to 6 iov&lt;BR /&gt;# am_opt_zcopy_align: &amp;lt;= 1&lt;BR /&gt;# am_align_mtu: &amp;lt;= 0&lt;BR /&gt;# am header: &amp;lt;= 8037&lt;BR /&gt;# connection: to ep, to iface&lt;BR /&gt;# device priority: 1&lt;BR /&gt;# device num paths: 1&lt;BR /&gt;# max eps: 256&lt;BR /&gt;# device address: 6 bytes&lt;BR /&gt;# iface address: 2 bytes&lt;BR /&gt;# ep address: 10 bytes&lt;BR /&gt;# error handling: peer failure, ep_check, keepalive&lt;BR /&gt;#&lt;BR /&gt;# Transport: tcp&lt;BR /&gt;# Device: lo&lt;BR /&gt;# System device: &amp;lt;unknown&amp;gt;&lt;BR /&gt;#&lt;BR /&gt;# capabilities:&lt;BR /&gt;# bandwidth: 11.91/ppn + 0.00 MB/sec&lt;BR /&gt;# latency: 10960 nsec&lt;BR /&gt;# overhead: 50000 nsec&lt;BR /&gt;# put_zcopy: &amp;lt;= 18446744073709551590, up to 6 iov&lt;BR /&gt;# put_opt_zcopy_align: &amp;lt;= 1&lt;BR /&gt;# put_align_mtu: &amp;lt;= 0&lt;BR /&gt;# am_short: &amp;lt;= 8K&lt;BR /&gt;# am_bcopy: &amp;lt;= 8K&lt;BR /&gt;# am_zcopy: &amp;lt;= 64K, up to 6 iov&lt;BR /&gt;# am_opt_zcopy_align: &amp;lt;= 1&lt;BR /&gt;# am_align_mtu: &amp;lt;= 0&lt;BR /&gt;# am header: &amp;lt;= 8037&lt;BR /&gt;# connection: to ep, to iface&lt;BR /&gt;# device priority: 1&lt;BR /&gt;# device num paths: 1&lt;BR /&gt;# max eps: 256&lt;BR /&gt;# device address: 18 bytes&lt;BR /&gt;# iface address: 2 bytes&lt;BR /&gt;# ep address: 10 bytes&lt;BR /&gt;# error handling: peer failure, ep_check, keepalive&lt;BR /&gt;#&lt;BR /&gt;#&lt;BR /&gt;# Connection manager: tcp&lt;BR /&gt;# max_conn_priv: 2064 bytes&lt;BR /&gt;#&lt;BR /&gt;# Memory domain: mlx4_0&lt;BR /&gt;# Component: ib&lt;BR /&gt;# register: unlimited, cost: 180 nsec&lt;BR /&gt;# remote key: 8 bytes&lt;BR /&gt;# local memory handle is required for zcopy&lt;BR /&gt;#&lt;BR /&gt;# Transport: rc_verbs&lt;BR /&gt;# Device: mlx4_0:1&lt;BR /&gt;# System device: 0000:83:00.0 (0)&lt;BR /&gt;#&lt;BR /&gt;# capabilities:&lt;BR /&gt;# bandwidth: 6433.22/ppn + 0.00 MB/sec&lt;BR /&gt;# latency: 900 + 1.000 * N nsec&lt;BR /&gt;# overhead: 75 nsec&lt;BR /&gt;# put_short: &amp;lt;= 88&lt;BR /&gt;# put_bcopy: &amp;lt;= 8256&lt;BR /&gt;# put_zcopy: &amp;lt;= 1G, up to 6 iov&lt;BR /&gt;# put_opt_zcopy_align: &amp;lt;= 512&lt;BR /&gt;# put_align_mtu: &amp;lt;= 2K&lt;BR /&gt;# get_bcopy: &amp;lt;= 8256&lt;BR /&gt;# get_zcopy: 65..1G, up to 6 iov&lt;BR /&gt;# get_opt_zcopy_align: &amp;lt;= 512&lt;BR /&gt;# get_align_mtu: &amp;lt;= 2K&lt;BR /&gt;# am_short: &amp;lt;= 87&lt;BR /&gt;# am_bcopy: &amp;lt;= 8255&lt;BR /&gt;# am_zcopy: &amp;lt;= 8255, up to 5 iov&lt;BR /&gt;# am_opt_zcopy_align: &amp;lt;= 512&lt;BR /&gt;# am_align_mtu: &amp;lt;= 2K&lt;BR /&gt;# am header: &amp;lt;= 127&lt;BR /&gt;# domain: device&lt;BR /&gt;# atomic_add: 64 bit&lt;BR /&gt;# atomic_fadd: 64 bit&lt;BR /&gt;# atomic_cswap: 64 bit&lt;BR /&gt;# connection: to ep&lt;BR /&gt;# device priority: 10&lt;BR /&gt;# device num paths: 1&lt;BR /&gt;# max eps: 256&lt;BR /&gt;# device address: 4 bytes&lt;BR /&gt;# ep address: 4 bytes&lt;BR /&gt;# error handling: peer failure, ep_check&lt;BR /&gt;#&lt;BR /&gt;#&lt;BR /&gt;# Transport: ud_verbs&lt;BR /&gt;# Device: mlx4_0:1&lt;BR /&gt;# System device: 0000:83:00.0 (0)&lt;BR /&gt;#&lt;BR /&gt;# capabilities:&lt;BR /&gt;# bandwidth: 6433.22/ppn + 0.00 MB/sec&lt;BR /&gt;# latency: 930 nsec&lt;BR /&gt;# overhead: 105 nsec&lt;BR /&gt;# am_short: &amp;lt;= 172&lt;BR /&gt;# am_bcopy: &amp;lt;= 4088&lt;BR /&gt;# am_zcopy: &amp;lt;= 4088, up to 8 iov&lt;BR /&gt;# am_opt_zcopy_align: &amp;lt;= 512&lt;BR /&gt;# am_align_mtu: &amp;lt;= 4K&lt;BR /&gt;# am header: &amp;lt;= 3952&lt;BR /&gt;# connection: to ep, to iface&lt;BR /&gt;# device priority: 10&lt;BR /&gt;# device num paths: 1&lt;BR /&gt;# max eps: inf&lt;BR /&gt;# device address: 4 bytes&lt;BR /&gt;# iface address: 3 bytes&lt;BR /&gt;# ep address: 6 bytes&lt;BR /&gt;# error handling: peer failure, ep_check&lt;BR /&gt;#&lt;BR /&gt;#&lt;BR /&gt;# Connection manager: rdmacm&lt;BR /&gt;# max_conn_priv: 54 bytes&lt;BR /&gt;#&lt;BR /&gt;# Memory domain: cma&lt;BR /&gt;# Component: cma&lt;BR /&gt;# register: unlimited, cost: 9 nsec&lt;BR /&gt;#&lt;BR /&gt;# Transport: cma&lt;BR /&gt;# Device: memory&lt;BR /&gt;# System device: &amp;lt;unknown&amp;gt;&lt;BR /&gt;#&lt;BR /&gt;# capabilities:&lt;BR /&gt;# bandwidth: 0.00/ppn + 11145.00 MB/sec&lt;BR /&gt;# latency: 80 nsec&lt;BR /&gt;# overhead: 400 nsec&lt;BR /&gt;# put_zcopy: unlimited, up to 16 iov&lt;BR /&gt;# put_opt_zcopy_align: &amp;lt;= 1&lt;BR /&gt;# put_align_mtu: &amp;lt;= 1&lt;BR /&gt;# get_zcopy: unlimited, up to 16 iov&lt;BR /&gt;# get_opt_zcopy_align: &amp;lt;= 1&lt;BR /&gt;# get_align_mtu: &amp;lt;= 1&lt;BR /&gt;# connection: to iface&lt;BR /&gt;# device priority: 0&lt;BR /&gt;# device num paths: 1&lt;BR /&gt;# max eps: inf&lt;BR /&gt;# device address: 8 bytes&lt;BR /&gt;# iface address: 4 bytes&lt;BR /&gt;# error handling: peer failure, ep_check&lt;BR /&gt;#&lt;/P&gt;
&lt;P&gt;but obtained same &lt;SPAN class="lia-message-read title_sub_section_selectors"&gt;segmentation fault&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 06 Jul 2021 10:33:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/UCX-segmentation-fault-in-libmkl-avx2-so-1/m-p/1296251#M8545</guid>
      <dc:creator>bxmbxm</dc:creator>
      <dc:date>2021-07-06T10:33:47Z</dc:date>
    </item>
    <item>
      <title>Re: UCX segmentation fault in libmkl_avx2.so.1</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/UCX-segmentation-fault-in-libmkl-avx2-so-1/m-p/1296376#M8550</link>
      <description>&lt;P&gt;I'm unable to reply. I have sent it 10 times and still nothing here.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Jul 2021 19:35:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/UCX-segmentation-fault-in-libmkl-avx2-so-1/m-p/1296376#M8550</guid>
      <dc:creator>bxmbxm</dc:creator>
      <dc:date>2021-07-06T19:35:45Z</dc:date>
    </item>
    <item>
      <title>Re: UCX segmentation fault in libmkl_avx2.so.1</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/UCX-segmentation-fault-in-libmkl-avx2-so-1/m-p/1296380#M8552</link>
      <description>&lt;P class="sub_section_element_selectors"&gt;83:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3] mcx353a-fcbt&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;X10DAI 2xIntel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz&amp;nbsp; 256GB&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;codes:&amp;nbsp;&lt;A class="sub_section_element_selectors" href="https://people.sissa.it/~sorella/TurboRVB_Manual/build/html/index.html" target="_blank" rel="nofollow noopener noreferrer"&gt;https://people.sissa.it/~sorella/TurboRVB_Manual/build/html/index.html&lt;/A&gt;&amp;nbsp;(private)&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;cp2k&amp;nbsp;&lt;A class="sub_section_element_selectors" href="https://www.cp2k.org/" target="_blank" rel="nofollow noopener noreferrer"&gt;https://www.cp2k.org&lt;/A&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;I have also tried newer&amp;nbsp; version of ucx but same crash and try also&amp;nbsp;MKL_VERBOSE=1 but I'm unable figure out where could be problem&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;ucx_info -v&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;[1625600433.356047] [localhost:47742:0] debug.c:1199 UCX DEBUG using signal stack 0x7f1572f67000 size 141824&lt;BR /&gt;[1625600433.379989] [localhost:47742:0] init.c:114 UCX DEBUG /home/bxm/Downloads/ucx-rpms/install/lib64/libucs.so.0 loaded at 0x7f1572670000&lt;BR /&gt;[1625600433.380038] [localhost:47742:0] init.c:115 UCX DEBUG cmd line: ucx_info -v&lt;BR /&gt;[1625600433.380059] [localhost:47742:0] module.c:69 UCX DEBUG ucs library path: /home/bxm/Downloads/ucx-rpms/install/lib64/libucs.so.0&lt;BR /&gt;[1625600433.380072] [localhost:47742:0] module.c:251 UCX DEBUG loading modules for ucs&lt;BR /&gt;# UCT version=1.11.0 revision 6031c98&lt;BR /&gt;# configured with: --host=x86_64-suse-linux-gnu --build=x86_64-suse-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/lib --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-dependency-tracking --disable-optimizations --disable-logging --disable-debug --disable-assertions --enable-mt --disable-params-check --without-java --enable-cma --without-cuda --without-gdrcopy --with-verbs --with-knem --with-rdmacm --without-rocm --without-xpmem --without-fuse3 --without-ugni --disable-numa&lt;/P&gt;</description>
      <pubDate>Tue, 06 Jul 2021 19:44:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/UCX-segmentation-fault-in-libmkl-avx2-so-1/m-p/1296380#M8552</guid>
      <dc:creator>bxmbxm</dc:creator>
      <dc:date>2021-07-06T19:44:37Z</dc:date>
    </item>
    <item>
      <title>Re: UCX segmentation fault in libmkl_avx2.so.1</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/UCX-segmentation-fault-in-libmkl-avx2-so-1/m-p/1296384#M8553</link>
      <description>&lt;P class="sub_section_element_selectors"&gt;83:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3] mcx353a-fcbt&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;X10DAI 2xIntel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz&amp;nbsp; 256GB&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;codes:&amp;nbsp;&lt;A class="sub_section_element_selectors" href="https://people.sissa.it/~sorella/TurboRVB_Manual/build/html/index.html" target="_blank" rel="nofollow noopener noreferrer"&gt;https://people.sissa.it/~sorella/TurboRVB_Manual/build/html/index.html&lt;/A&gt;&amp;nbsp;(private)&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;cp2k&amp;nbsp;&lt;A class="sub_section_element_selectors" href="https://www.cp2k.org/" target="_blank" rel="nofollow noopener noreferrer"&gt;https://www.cp2k.org&lt;/A&gt;&lt;/P&gt;
&lt;P class="sub_section_element_selectors"&gt;I have also tried newer&amp;nbsp; version of ucx UCT version=1.11.0 revision 6031c98 but same crash and try also&amp;nbsp;MKL_VERBOSE=1 but I'm unable figure out where could be problem&lt;/P&gt;</description>
      <pubDate>Tue, 06 Jul 2021 19:59:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/UCX-segmentation-fault-in-libmkl-avx2-so-1/m-p/1296384#M8553</guid>
      <dc:creator>bxmbxm</dc:creator>
      <dc:date>2021-07-06T19:59:27Z</dc:date>
    </item>
    <item>
      <title>Re:UCX segmentation fault in libmkl_avx2.so.1</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/UCX-segmentation-fault-in-libmkl-avx2-so-1/m-p/1296973#M8569</link>
      <description>&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Please provide answers to the below questions.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;1.Could you please provide us the command line you have been using?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;2.Could you also provide the details of the below commands?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;  &amp;nbsp;which mpirun&lt;/P&gt;&lt;P&gt;&amp;nbsp;  &amp;nbsp;ldd &amp;lt;executable file&amp;gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;3.Could you please set I_MPI_DEBUG=10 and provide a complete error log?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;4.Are you able to run benchmarks/other applications in this environment or facing the same issues?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;5.Do you have exclusive access to all the nodes?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks &amp;amp; Regards&lt;/P&gt;&lt;P&gt;Shivani&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 08 Jul 2021 12:22:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/UCX-segmentation-fault-in-libmkl-avx2-so-1/m-p/1296973#M8569</guid>
      <dc:creator>ShivaniK_Intel</dc:creator>
      <dc:date>2021-07-08T12:22:51Z</dc:date>
    </item>
    <item>
      <title>Re: UCX segmentation fault in libmkl_avx2.so.1</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/UCX-segmentation-fault-in-libmkl-avx2-so-1/m-p/1296987#M8571</link>
      <description>&lt;P&gt;Thanks for reply&lt;/P&gt;
&lt;P&gt;I think I have found source of problems. I think I was to fast posting such a problem here. First I think it is not UCX related bug as I've written. Backtrace only show that ucx library is loaded and it must be because libmlx-fi.so as part of libfabric.so loaded it for my mellanox card.&lt;/P&gt;
&lt;P&gt;In cp2k I have found that after increasing stack size sigsegv disapeared and code work well.&lt;/P&gt;
&lt;P&gt;For turborvb after compiling with debug features enabled and without optimalizations I was able to do some test calculations without such sigsegv but some checks return infinite numbers so It looks like bug in code, like bad pointer or alignment of memory in parameters for mkl calling. So I think we should close this thread as solved.&lt;/P&gt;</description>
      <pubDate>Thu, 08 Jul 2021 13:13:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/UCX-segmentation-fault-in-libmkl-avx2-so-1/m-p/1296987#M8571</guid>
      <dc:creator>bxmbxm</dc:creator>
      <dc:date>2021-07-08T13:13:24Z</dc:date>
    </item>
    <item>
      <title>Re: UCX segmentation fault in libmkl_avx2.so.1</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/UCX-segmentation-fault-in-libmkl-avx2-so-1/m-p/1298038#M8589</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Glad to know that your issue is resolved. If you need any additional information, please post a new question as this thread is no longer be monitored by Intel&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks &amp;amp; Regards&lt;/P&gt;
&lt;P&gt;Shivani&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 13 Sep 2021 04:42:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/UCX-segmentation-fault-in-libmkl-avx2-so-1/m-p/1298038#M8589</guid>
      <dc:creator>ShivaniK_Intel</dc:creator>
      <dc:date>2021-09-13T04:42:15Z</dc:date>
    </item>
  </channel>
</rss>

