<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re:MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1384088#M9481</link>
    <description>&lt;P&gt;Hi Karen,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;1) can you try it using a newer libfabric? The one shipped by IMPI is 1.12.1 but the newest should be 1.15.0&lt;/P&gt;&lt;P&gt;&amp;nbsp;2) can you reproduce this issue in an Intel machine with the newest libfabric?&lt;/P&gt;&lt;P&gt;&amp;nbsp;3) if yes, can you show where the call hangs, e.g. by using gstack?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;Cheers,&lt;/P&gt;&lt;P&gt;&amp;nbsp;Rafael&lt;/P&gt;&lt;BR /&gt;</description>
    <pubDate>Fri, 13 May 2022 08:40:04 GMT</pubDate>
    <dc:creator>Rafael_L_Intel</dc:creator>
    <dc:date>2022-05-13T08:40:04Z</dc:date>
    <item>
      <title>MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1367032#M9268</link>
      <description>&lt;P&gt;A colleague wrote a small MPI Isend/Recv test case to try to reproduce a performance issue with an application when using RoCE, but the test case hangs with large message sizes when run with 2 or more processes per node across 2 or more nodes. The same test case runs successfully with large message sizes in an environment with Infiniband.&lt;/P&gt;
&lt;P&gt;Initially it hung with message sizes larger than 16K but usage of the FI_OFI_RXM_BUFFER_SIZE variable allowed increasing the message size to about 750K.&amp;nbsp; We were trying to get to 1 MB, but no matter how large&amp;nbsp;FI_OFI_RXM_BUFFER_SIZE is set to, the test hangs with a message size of 1 MB.&amp;nbsp; Are there other MPI settings or OS settings that may need to be increased?&amp;nbsp; I also tried setting&amp;nbsp;FI_OFI_RXM_SAR_LIMIT, but that didn't help. Here are the current set of MPI options for the test when using RoCE:&lt;/P&gt;
&lt;P&gt;mpi_flags='-genv I_MPI_OFI_PROVIDER=verbs -genv FI_VERBS_IFACE=vlan50 -genv I_MPI_OFI_LIBRARY_INTERNAL=1 -genv I_MPI_FABRICS=shm:ofi -genv FI_OFI_RXM_BUFFER_SIZE=2000000 -genv FI_OFI_RXM_SAR_LIMIT=4000000 -genv I_MPI_DEBUG=30 -genv FI_LOG_LEVEL=debug'&lt;/P&gt;
&lt;P&gt;The environment is SLES 15 SP2 with Intel OneAPI Toolkit version 2021.2, with Mellanox CX-6 network adapters in Ethernet mode and 100 Gb Aruba switches.&amp;nbsp; The NICs and switches have been configured for RoCE traffic per guidelines from Mellanox and our Aruba engineering team.&lt;/P&gt;
&lt;P&gt;A screenshot of the main loop of MPI code (I will get the full source code from my colleague), along with the output of the test&amp;nbsp;when the message size is 1 MB and&amp;nbsp; I_MPI_DEBUG=30 and FI_LOG_LEVEL=debug are attached. The script that is used to run the test is shown below.&amp;nbsp; The script input parameters are the number of repetitions and message size.&lt;/P&gt;
&lt;P&gt;#!/bin/bash&lt;BR /&gt;cf_args=()&lt;BR /&gt;while [ $# -gt 0 ]; do&lt;BR /&gt;cf_args+=("$1")&lt;BR /&gt;shift&lt;BR /&gt;done&lt;/P&gt;
&lt;P&gt;source /opt/intel/oneapi/mpi/2021.3.0/env/vars.sh&lt;/P&gt;
&lt;P&gt;set -e&lt;BR /&gt;set -u&lt;/P&gt;
&lt;P&gt;mpirun --version 2&amp;gt;&amp;amp;1 | grep -i "intel.*mpi"&lt;/P&gt;
&lt;P&gt;hostlist='-hostlist perfcomp3,perfcomp4'&lt;/P&gt;
&lt;P&gt;mpi_flags='-genv I_MPI_OFI_PROVIDER=verbs -genv FI_VERBS_IFACE=vlan50 -genv I_MPI_OFI_LIBRARY_INTERNAL=1 -genv I_MPI_FABRICS=shm:ofi -genv FI_OFI_RXM_BUFFER_SIZE=2000000 -genv I_MPI_OFI_PROVIDER=verbs -genv FI_OFI_RXM_SAR_LIMIT=4000000 -genv I_MPI_DEBUG=30 -genv FI_LOG_LEVEL=debug -genv I_MPI_OFI_PROVIDER_DUMP=1'&lt;/P&gt;
&lt;P&gt;echo "$hostlist"&lt;/P&gt;
&lt;P&gt;mpirun -ppn 1 \&lt;BR /&gt;$hostlist $mpi_flags \&lt;BR /&gt;hostname&lt;/P&gt;
&lt;P&gt;num_nodes=$(mpirun -ppn 1 $hostlist $mpi_flags hostname | sort -u | wc -l)&lt;BR /&gt;echo "num_nodes=$num_nodes"&lt;/P&gt;
&lt;P&gt;mpirun -ppn 2 \&lt;BR /&gt;$hostlist $mpi_flags \&lt;BR /&gt;singularity run -H `pwd` \&lt;BR /&gt;/var/tmp/paulo/gromacs/gromacs_tau.sif \&lt;BR /&gt;v4/mpi_isend_recv "${cf_args[@]}"&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="MPI_Isend_Recv main loop" style="width: 999px;"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/27408i8CEEE9A483A91C88/image-size/large/is-moderation-mode/true?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="Isend_Recv_code.png" alt="MPI_Isend_Recv main loop" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;MPI_Isend_Recv main loop&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Mar 2022 05:01:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1367032#M9268</guid>
      <dc:creator>KarenD</dc:creator>
      <dc:date>2022-03-09T05:01:33Z</dc:date>
    </item>
    <item>
      <title>Re: MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1367265#M9274</link>
      <description>&lt;P&gt;Attaching the OS kernel settings for the servers, 99-sysctl.conf.txt (.txt extension added to allow it to be uploaded)&lt;/P&gt;</description>
      <pubDate>Wed, 09 Mar 2022 18:20:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1367265#M9274</guid>
      <dc:creator>KarenD</dc:creator>
      <dc:date>2022-03-09T18:20:16Z</dc:date>
    </item>
    <item>
      <title>Re:MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1367422#M9278</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for posting in Intel Communities.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;We are working on your issue internally and will get back to you soon.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks &amp;amp; Regards,&lt;/P&gt;&lt;P&gt;Varsha&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 10 Mar 2022 05:08:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1367422#M9278</guid>
      <dc:creator>VarshaS_Intel</dc:creator>
      <dc:date>2022-03-10T05:08:20Z</dc:date>
    </item>
    <item>
      <title>Re: MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1367694#M9293</link>
      <description>&lt;P&gt;Attaching the full source code for the Isend/Recv test.&amp;nbsp; Also, here are the settings configured on the Mellanox CX-6 adapter:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;perfcomp4:~ # cat mellanox_advanced_tuning.sh&lt;/P&gt;
&lt;P&gt;mlxconfig -d 43:00.0 --yes s ADVANCED_PCI_SETTINGS=1&lt;/P&gt;
&lt;P&gt;mlxconfig -d 43:00.1 --yes s ADVANCED_PCI_SETTINGS=1&lt;/P&gt;
&lt;P&gt;mlxconfig -d 43:00.1 --yes set MAX_ACC_OUT_READ=32&lt;/P&gt;
&lt;P&gt;mlxconfig -d 43:00.0 --yes set MAX_ACC_OUT_READ=32&lt;/P&gt;
&lt;P&gt;mlxconfig -d 43:00.0 --yes set PCI_WR_ORDERING=1&lt;/P&gt;
&lt;P&gt;mlxconfig -d 43:00.1 --yes set PCI_WR_ORDERING=1&lt;/P&gt;
&lt;P&gt;mlxfwreset -d 0000:43:00.0 --level 3 -y reset&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;perfcomp4:~ # cat config_mellanox_roce.sh&lt;/P&gt;
&lt;P&gt;# Use ip link to set priority 3 on vlan_50&lt;/P&gt;
&lt;P&gt;ip link set vlan50 type vlan egress 2:3&lt;/P&gt;
&lt;P&gt;# set DSCP to 26 and PRIO to 3&lt;/P&gt;
&lt;P&gt;mlxconfig -y -d 0000:43:00.0 set CNP_DSCP_P1=26 CNP_802P_PRIO_P1=3&lt;/P&gt;
&lt;P&gt;mlxconfig -y -d 0000:43:00.1 set CNP_DSCP_P2=26 CNP_802P_PRIO_P2=3&lt;/P&gt;
&lt;P&gt;mlxfwreset -d 0000:43:00.0 --level 3 -y reset&lt;/P&gt;
&lt;P&gt;# set tos to 106&lt;/P&gt;
&lt;P&gt;echo 106 &amp;gt; /sys/class/infiniband/mlx5_bond_0/tc/1/traffic_class&lt;/P&gt;
&lt;P&gt;cma_roce_tos -d mlx5_bond_0 -t 106&lt;/P&gt;
&lt;P&gt;# set trust pcp and set pfc for queue 3&lt;/P&gt;
&lt;P&gt;mlnx_qos -i ens2f0 --trust pcp&lt;/P&gt;
&lt;P&gt;mlnx_qos -i ens2f0 --pfc 0,0,0,1,0,0,0,0&lt;/P&gt;
&lt;P&gt;mlnx_qos -i ens2f1 --trust pcp&lt;/P&gt;
&lt;P&gt;mlnx_qos -i ens2f1 --pfc 0,0,0,1,0,0,0,0&lt;/P&gt;
&lt;P&gt;# enable ECN for TCP&lt;/P&gt;
&lt;P&gt;sysctl -w net.ipv4.tcp_ecn=1&lt;/P&gt;</description>
      <pubDate>Thu, 10 Mar 2022 21:20:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1367694#M9293</guid>
      <dc:creator>KarenD</dc:creator>
      <dc:date>2022-03-10T21:20:17Z</dc:date>
    </item>
    <item>
      <title>Re: MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1368373#M9303</link>
      <description>&lt;P&gt;My colleague built the test case using Open MPI, and it runs to completion with Open MPI.&amp;nbsp; So it appears, that we're missing some setting to allow it to work with Intel MPI, or there is a network or kernel setting that impacts the test case when using Intel MPI.&amp;nbsp; Appreciate any insight that you may have.&lt;/P&gt;</description>
      <pubDate>Mon, 14 Mar 2022 01:22:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1368373#M9303</guid>
      <dc:creator>KarenD</dc:creator>
      <dc:date>2022-03-14T01:22:39Z</dc:date>
    </item>
    <item>
      <title>Re: MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1368779#M9311</link>
      <description>&lt;P&gt;Some additional info when the MPI provider is changed from verbs to the default setting of psm3:&lt;/P&gt;
&lt;P&gt;If the following MPI flags are removed from the script to run the testcase, allowing the default settings to be selected, the testcase runs to completion with a message size of 1 MB:&lt;/P&gt;
&lt;P&gt;mpi_flags='-genv I_MPI_OFI_PROVIDER=verbs -genv FI_VERBS_IFACE=vlan50 -genv I_MPI_OFI_LIBRARY_INTERNAL=1 -genv I_MPI_FABRICS=shm:ofi'&lt;/P&gt;
&lt;P&gt;Setting I_MPI_DEBUG=10 shows that the provider when no flags are set is psm3:&lt;/P&gt;
&lt;P&gt;[0] MPI startup(): libfabric provider: psm3&lt;/P&gt;
&lt;P&gt;However, the Intel MPI documentation says that psm3 is for Intel NICs, and I am using Mellanox CX-6 NICs:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;PSM3 supports standard Ethernet networks and leverages standard RoCEv2 protocols as implemented by the Intel® Ethernet Fabric Suite NICs&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Looking at tcpdump data collected during the run of the testcase, the tcpdump data with the psm3 provider differs from tcpdump data with the verbs provider:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;The protocol is shown as RoCE, whereas with the verbs provider, the protocol is shown as RRoCE.&lt;/LI&gt;
&lt;LI&gt;With psm3, there are many fewer entries in the tcpdump data that are marked as RoCE as compared to the RRoCE entries seen with the verbs provider. &amp;nbsp;All of the entries are shown as “UD Send Only”, whereas with the verbs provider, there are many types of entries, including “RD Send Only”, RD Acknowledge, RDMA Read Request, RDMA Read Response First, RDMA Read Response Middle, RDMA Read Response End.&lt;/LI&gt;
&lt;LI&gt;In addition, both the NICs and the Aruba switches are configured to set RoCE traffic to priority 3, which is allowed more bandwidth on the switch than other traffic. However, when the psm3 provider is used, the RoCe traffic seen in the tcpdump data is shown with priority 0.&amp;nbsp;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The attached file named mpi_psm3.pcap.png is a sample of the tcpdump data shown in wireshark for the case where the provider is psm3. Note that the only RoCE traffic is “UD Send Only” and the priority is 0.&lt;/P&gt;
&lt;P&gt;The attached file named mpi_verbs.pcap.png is a sample of the tcpdump data when the provider is verbs.&amp;nbsp; Notice that there are many types of RRoCE entries and the priority is set to 3.&lt;/P&gt;
&lt;P&gt;My understanding was that I should be using the verbs provider to use RoCE protocol with the Mellanox CX-6 provider. What am I missing to get the testcase to run to completion with a message size of 1 MB?&lt;/P&gt;</description>
      <pubDate>Tue, 15 Mar 2022 02:05:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1368779#M9311</guid>
      <dc:creator>KarenD</dc:creator>
      <dc:date>2022-03-15T02:05:02Z</dc:date>
    </item>
    <item>
      <title>Re: MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1368780#M9312</link>
      <description>&lt;P&gt;Here are the screenshots of the tcpdump data mentioned in the last post (see attached files)&lt;/P&gt;</description>
      <pubDate>Tue, 15 Mar 2022 02:06:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1368780#M9312</guid>
      <dc:creator>KarenD</dc:creator>
      <dc:date>2022-03-15T02:06:27Z</dc:date>
    </item>
    <item>
      <title>Re:MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1371030#M9344</link>
      <description>&lt;P&gt;Hi Karen,&lt;/P&gt;&lt;P&gt;Can you check the UCX version? With Intel MPI 2021.2 and above UCX version needs to be &amp;gt;1.6.&lt;/P&gt;&lt;P&gt;to check UCX version: &lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: Calibri, sans-serif; font-size: 11pt;"&gt;~&amp;gt; ucx_info -v&lt;/SPAN&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 22 Mar 2022 15:43:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1371030#M9344</guid>
      <dc:creator>Vinutha_SV</dc:creator>
      <dc:date>2022-03-22T15:43:20Z</dc:date>
    </item>
    <item>
      <title>Re: Re:MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1371475#M9349</link>
      <description>&lt;P&gt;The UCX version is 1.10.0:&lt;/P&gt;
&lt;P&gt;perfcomp3:~ # ucx_info -v&lt;BR /&gt;# UCT version=1.10.0 revision 7477e81&lt;BR /&gt;# configured with: --host=x86_64-suse-linux-gnu --build=x86_64-suse-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/lib --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --disable-dependency-tracking --disable-optimizations --disable-logging --disable-debug --disable-assertions --enable-mt --disable-params-check --without-java --enable-cma --without-cuda --without-gdrcopy --with-verbs --without-cm --with-knem --with-rdmacm --without-rocm --without-xpmem --without-ugni --disable-numa&lt;/P&gt;</description>
      <pubDate>Wed, 23 Mar 2022 20:20:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1371475#M9349</guid>
      <dc:creator>KarenD</dc:creator>
      <dc:date>2022-03-23T20:20:24Z</dc:date>
    </item>
    <item>
      <title>Re: MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1371517#M9350</link>
      <description>&lt;P&gt;I ran some additional tests, varying the number of processes per node (ppn), and running multiple iterations of the test. With 2 ppn, the test hangs on the first iteration.&amp;nbsp; With 4 or 16 ppn, multiple iterations can be run, but eventually the test hangs:&lt;/P&gt;
&lt;DIV style="direction: ltr;"&gt;
&lt;TABLE style="direction: ltr; border-collapse: collapse; border: 1pt solid #A3A3A3;" title="" border="1" summary="" cellspacing="0" cellpadding="0"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;"&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD style="vertical-align: top; width: 1.752in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;"&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;2 nodes&lt;/P&gt;
&lt;/TD&gt;
&lt;TD style="vertical-align: top; width: 1.4784in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;"&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;4 nodes&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;"&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;2 ppn&lt;/P&gt;
&lt;/TD&gt;
&lt;TD style="vertical-align: top; width: 1.752in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;"&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;Hangs on 1&lt;SPAN&gt;st&lt;/SPAN&gt; repetition&lt;/P&gt;
&lt;/TD&gt;
&lt;TD style="vertical-align: top; width: 1.5597in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;"&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;Hangs on 1&lt;SPAN&gt;st&lt;/SPAN&gt; repetition&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;"&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;4 ppn&lt;/P&gt;
&lt;/TD&gt;
&lt;TD style="vertical-align: top; width: 1.7715in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;"&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;Hangs on 28th repetition&lt;/P&gt;
&lt;/TD&gt;
&lt;TD style="vertical-align: top; width: 1.6104in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;"&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;Hangs on 8th repetition&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD style="vertical-align: top; width: .6673in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;"&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;16 ppn&lt;/P&gt;
&lt;/TD&gt;
&lt;TD style="vertical-align: top; width: 1.752in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;"&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;Hangs on 10&lt;SPAN&gt;th&lt;/SPAN&gt; repetition&lt;/P&gt;
&lt;/TD&gt;
&lt;TD style="vertical-align: top; width: 1.593in; padding: 4pt 4pt 4pt 4pt; border: 1pt solid #A3A3A3;"&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;Hangs on 2&lt;SPAN&gt;nd&lt;/SPAN&gt; repetition&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When the test hangs, all of the processes are consuming 100% CPU, so perhaps they are all spinning and waiting on some resource. Do&amp;nbsp; you have any recommendations on how to determine what it's waiing for?&lt;/P&gt;
&lt;P&gt;Note that the same test case runs successfully with Open MPI.&amp;nbsp; The 4-node 16 ppn test was getting very high latency, but that was resolved by setting UCX_RNDV_THRESH to a value higher than the message size (in this case the message size is 1 MB and I set UCX_RNDV_THRESH to 2 MB).&amp;nbsp; With Intel MPI, I have already set&amp;nbsp;&lt;SPAN&gt;FI_OFI_RXM_BUFFER_SIZE=2000000, but that only allowed running with a message size up to 750 MB for the 2-node, 2 ppn test.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Is it possible to use the UCX options with Intel MPI and the verbs provider? Or is the mlx provider (which I believe requires Infiniband) required to use UCX?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 24 Mar 2022 00:24:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1371517#M9350</guid>
      <dc:creator>KarenD</dc:creator>
      <dc:date>2022-03-24T00:24:29Z</dc:date>
    </item>
    <item>
      <title>Re:MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1372676#M9364</link>
      <description>&lt;P&gt;&lt;SPAN style="font-family: intel-clear;"&gt;Please try MLX provider as it covers RoCE as well as IB. With MLX provider you may use same set of UCX-level controls you are using with OpenMPI+UCX if needed.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: intel-clear;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: intel-clear;"&gt;Can you run command $ ibv_devinfo and send me the details. (need to know which transport it is using)&lt;/SPAN&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 29 Mar 2022 10:38:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1372676#M9364</guid>
      <dc:creator>Vinutha_SV</dc:creator>
      <dc:date>2022-03-29T10:38:37Z</dc:date>
    </item>
    <item>
      <title>Re: Re:MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1372761#M9365</link>
      <description>&lt;P&gt;Here is the output from the ibv_devinfo command:&lt;/P&gt;
&lt;P&gt;perfcomp3:~ # ibv_devinfo&lt;/P&gt;
&lt;P&gt;hca_id: mlx5_bond_0&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;transport:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; InfiniBand (0)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; fw_ver:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 20.30.1004&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; node_guid:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 9440:c9ff:ffa9:e258&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sys_image_guid:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 9440:c9ff:ffa9:e258&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; vendor_id:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0x02c9&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; vendor_part_id:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4123&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hw_ver:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0x0&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; board_id:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MT_0000000453&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; phys_port_cnt:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; port:&amp;nbsp;&amp;nbsp; 1&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; state:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PORT_ACTIVE (4)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; max_mtu:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4096 (5)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; active_mtu:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4096 (5)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sm_lid:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; port_lid:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; port_lmc:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0x00&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;link_layer:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Ethernet&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I tried using the UCX options with the mlx provider, but am getting errors about using that provider:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;perfcomp3:/var/tmp/paulo/mpi_tests # sh launch_mpi_isend_recv_v4_2ppn.sh --repetitions 1 --size 1000000&lt;/P&gt;
&lt;P&gt;Intel(R) MPI Library for Linux* OS, Version 2021.3 Build 20210601 (id: 6f90181f1)&lt;/P&gt;
&lt;P&gt;-hostlist perfcomp3,perfcomp4&lt;/P&gt;
&lt;P&gt;perfcomp3&lt;/P&gt;
&lt;P&gt;perfcomp4&lt;/P&gt;
&lt;P&gt;num_nodes=2&lt;/P&gt;
&lt;P&gt;[0] MPI startup(): Intel(R) MPI Library, Version 2021.3&amp;nbsp; Build 20210601 (id: 6f90181f1)&lt;/P&gt;
&lt;P&gt;[0] MPI startup(): Copyright (C) 2003-2021 Intel Corporation.&amp;nbsp; All rights reserved.&lt;/P&gt;
&lt;P&gt;[0] MPI startup(): library kind: release&lt;/P&gt;
&lt;P&gt;[0] MPI startup(): shm segment size (1580 MB per rank) * (2 local ranks) = 3160 MB total&lt;/P&gt;
&lt;P&gt;[2] MPI startup(): shm segment size (1580 MB per rank) * (2 local ranks) = 3160 MB total&lt;/P&gt;
&lt;P&gt;[0] MPI startup(): libfabric version: 1.12.1-impi&lt;/P&gt;
&lt;P&gt;Abort(1091215) on node 2 (rank 2 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:&lt;/P&gt;
&lt;P&gt;MPIR_Init_thread(138)........:&lt;/P&gt;
&lt;P&gt;MPID_Init(1169)..............:&lt;/P&gt;
&lt;P&gt;MPIDI_OFI_mpi_init_hook(1419): OFI addrinfo() failed (ofi_init.c:1419:MPIDI_OFI_mpi_init_hook:No data available)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here are the UCX options that I tried:&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;mpi_flags='-genv I_MPI_OFI_PROVIDER=mlx -genv I_MPI_FABRICS=shm:ofi -genv UCX_TLS=self,rc,dc,ud -genv UCX_NET_DEVICES=mlx5_bond_0:1 -genv UCX_RNDV_THRESH=2000000 -genv UCX_IB_SL=3 -genv UCX_IB_TRAFFIC_CLASS=106 -genv UCX_IB_GID_INDEX=3 -genv I_MPI_DEBUG=10'&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;All of the UCX options except UCX_TLS have been used successfully with the Open MPI version of this testcase.&amp;nbsp; I added UCX_TLS because I found the following info that says that UCX_TLS must be explicitly set when using Intel MPI with AMD processors (and the test case was also failing with the same error shown above before I added UCX_TLS):&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://github.com/openucx/ucx/issues/6253" target="_blank"&gt;https://github.com/openucx/ucx/issues/6253&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Here are the available transports from the ucx_info command:&lt;/P&gt;
&lt;P&gt;perfcomp4:/var/tmp/pcap # ucx_info -d | grep Transport&lt;/P&gt;
&lt;P&gt;#&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Transport: posix&lt;/P&gt;
&lt;P&gt;#&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Transport: sysv&lt;/P&gt;
&lt;P&gt;#&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Transport: self&lt;/P&gt;
&lt;P&gt;#&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Transport: tcp&lt;/P&gt;
&lt;P&gt;#&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Transport: tcp&lt;/P&gt;
&lt;P&gt;#&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Transport: tcp&lt;/P&gt;
&lt;P&gt;#&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Transport: rc_verbs&lt;/P&gt;
&lt;P&gt;#&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Transport: rc_mlx5&lt;/P&gt;
&lt;P&gt;#&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Transport: dc_mlx5&lt;/P&gt;
&lt;P&gt;#&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Transport: ud_verbs&lt;/P&gt;
&lt;P&gt;#&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Transport: ud_mlx5&lt;/P&gt;
&lt;P&gt;#&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Transport: cma&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here is the mpirun command line when using the Open MPI version of the test case:&lt;/P&gt;
&lt;P&gt;mpirun -np $num_proc \&lt;BR /&gt;--machinefile hfile_karen \&lt;BR /&gt;--prefix /var/tmp/paulo/openmpi-4.1.2 \&lt;BR /&gt;--map-by ppr:8:socket --report-bindings \&lt;BR /&gt;--mca btl ^openib \&lt;BR /&gt;--mca pml ucx -x UCX_IB_SL=3 -x UCX_NET_DEVICES=mlx5_bond_0:1 -x UCX_IB_TRAFFIC_CLASS=106 -x UCX_IB_GID_INDEX=3 \&lt;BR /&gt;--allow-run-as-root \&lt;BR /&gt;./runme.sh&lt;/P&gt;
&lt;P&gt;What are the proper options to use to allow UCX to be used with Intel MPI?&lt;/P&gt;</description>
      <pubDate>Tue, 29 Mar 2022 17:43:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1372761#M9365</guid>
      <dc:creator>KarenD</dc:creator>
      <dc:date>2022-03-29T17:43:43Z</dc:date>
    </item>
    <item>
      <title>Re: Re:MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1374172#M9384</link>
      <description>&lt;P&gt;There have been several email exchanges in the past week, but none of the suggestions have resolved the issue. I tried using&amp;nbsp;I_MPI_OFI_PROVIDER=mlx and&amp;nbsp;FI_MLX_NET_DEVICES=mlx5_bond_0:1, but this results in the message "no mlx device is found". So it's not clear that the mlx provider can be used with the Mellanox CX-6 adapter in Ethernet mode with RoCE protocol. So I'm still trying to understand how to use UCX options with Intel MPI. My test case runs successfully with Open MPI and UCX options, but hangs with Intel MPI with the provider set to verbs if the message size is 1 MB, and it fails with an error with the provider set to mlx. Further details are provided in the email exchange shown below, and the output from setting log level to debug with the mlx provider is attached.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;Sent:&lt;/STRONG&gt; Thursday, March 31, 2022 11:34 AM&lt;/P&gt;
&lt;P&gt;Hi Julia,&lt;/P&gt;
&lt;P&gt;Thanks for the suggestion.&amp;nbsp; The error message remains the same when using FI_MLX_NET_DEVICES. The output with log level set to debug is attached. The MPI options that were used and some of the output with the error message is shown below.&amp;nbsp; I’m still puzzled about the suggestion to use the mlx provider, since the Intel MPI documentation indicates that it is for usage with Mellanox Infiniband hardware.&amp;nbsp; I am using the Mellanox CX-6 adapter, but it is configured in Ethernet mode, not Infiniband, and I don’t have Infiniband switches. The network adapters and switches are configured for RoCE. I was trying to use the UCX options with Intel MPI because this test case that fails with Intel MPI is working fine with Open MPI and UCX options on this same hardware. I’m trying to determine if it’s possible to use the UCX options with Intel MPI. Note that originally I was using I_MPI_OFI_PROVIDER=verbs but the test case was hanging with a message size of 1 MB. Any further advice on how to get the test to work with Intel MPI and UCX (or other options) are much appreciated.&amp;nbsp; The full details of the original case are posted here:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://community.intel.com/t5/Intel-oneAPI-HPC-Toolkit/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1372676" target="_blank" rel="noopener"&gt;https://community.intel.com/t5/Intel-oneAPI-HPC-Toolkit/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1372676&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Karen&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;mpi_flags='-genv I_MPI_OFI_PROVIDER=mlx -genv I_MPI_FABRICS=shm:ofi -genv FI_MLX_NET_DEVICES=mlx5_bond_0:1 -genv I_MPI_DEBUG=10 -genv FI_LOG_LEVEL=debug'&lt;/P&gt;
&lt;P&gt;perfcomp3:/var/tmp/paulo/mpi_tests # sh launch_mpi_isend_recv_v4_2ppn.sh --repetitions 1 --size 1000000 2&amp;gt;&amp;amp;1 | tee launch_mpi_isend_recv_v4_2ppn_ompi_debug_MLXdev.out&lt;/P&gt;
&lt;P&gt;libfabric:16416:verbs:fabric:verbs_devs_print():880&amp;lt;info&amp;gt; list of verbs devices found for FI_EP_MSG:&lt;/P&gt;
&lt;P&gt;libfabric:16416:verbs:fabric:verbs_devs_print():884&amp;lt;info&amp;gt; #1 mlx5_bond_0 - IPoIB addresses:&lt;/P&gt;
&lt;P&gt;libfabric:16416:verbs:fabric:verbs_devs_print():894&amp;lt;info&amp;gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 172.30.217.84&lt;/P&gt;
&lt;P&gt;libfabric:16416:verbs:fabric:vrb_get_device_attrs():618&amp;lt;info&amp;gt; device mlx5_bond_0: first found active port is 1&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;libfabric:16419:mlx:core:mlx_getinfo():171&amp;lt;info&amp;gt; no mlx device is found.&lt;/P&gt;
&lt;P&gt;libfabric:16419:core:core:ofi_layering_ok():964&amp;lt;info&amp;gt; Need core provider, skipping ofi_rxm&lt;/P&gt;
&lt;P&gt;Abort(1091215) on node 2 (rank 2 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:&lt;/P&gt;
&lt;P&gt;MPIR_Init_thread(138)........:&lt;/P&gt;
&lt;P&gt;MPID_Init(1169)..............:&lt;/P&gt;
&lt;P&gt;MPIDI_OFI_mpi_init_hook(1419): OFI addrinfo() failed (ofi_init.c:1419:MPIDI_OFI_mpi_init_hook:No data available)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Sent:&lt;/STRONG&gt; Thursday, March 31, 2022 6:56 AM&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hi Karen,&lt;BR /&gt;Could you please replace UCX_NET_DEVICES environment variable to FI_MLX_NET_DEVICES and rerun with FI_LOG_LEVEL=debug, FI_PROVIDER=mlx?&lt;BR /&gt;&lt;BR /&gt;BRs,&lt;BR /&gt;Julia&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;Sent:&lt;/STRONG&gt; Wednesday, March 30, 2022 8:25 PM&lt;/P&gt;
&lt;P&gt;Hi Alex,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for the suggestions.&amp;nbsp; The output with log level set to debug is attached.&amp;nbsp; With Open MPI, I was using UCX_NET_DEVICES=mlx5_bond_0:1, but when using Intel MPI with that setting and I_MPI_OFI_PROVIDER=mlx, I see that it says no Mellanox device is found:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;libfabric:9588:mlx:core:mlx_getinfo():171&amp;lt;info&amp;gt; no mlx device is found.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Looks like it is seeing mlx_bond_0 as a verbs device:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;libfabric:31635:verbs:fabric:vrb_get_device_attrs():618&amp;lt;info&amp;gt; device mlx5_bond_0: first found active port is 1&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But I am trying to use the mlx provider so that I could specify the UCX options. The servers have dual-port CX-6 Mellanox adapters configured with Ethernet mode and the two ports are configured in a bond. The NICs and Aruba 100Gb switches are configured for RoCE. Is that compatible with using the mlx provider? If so, what device name should be specified? With Intel MPI and the verbs provider, I was specifying &amp;nbsp;FI_VERBS_IFACE=vlan50. The vlan device vlan50 is configured on top of the bond for the Mellanox adapter. With OpenMPI, I was using UCX_IB_GID_INDEX=3, which corresponds to the vlan50 device with RoCE v2:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;perfcomp4: # /usr/sbin/show_gids&lt;/P&gt;
&lt;P&gt;DEV&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PORT&amp;nbsp;&amp;nbsp;&amp;nbsp; INDEX&amp;nbsp;&amp;nbsp; GID&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; IPv4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; VER&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; DEV&lt;/P&gt;
&lt;P&gt;---&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ----&amp;nbsp;&amp;nbsp;&amp;nbsp; -----&amp;nbsp;&amp;nbsp; ---&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ------------&amp;nbsp;&amp;nbsp; ---&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ---&lt;/P&gt;
&lt;P&gt;mlx5_bond_0&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; fe80:0000:0000:0000:9640:c9ff:fea9:e248&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; v1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; bond0&lt;/P&gt;
&lt;P&gt;mlx5_bond_0&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; fe80:0000:0000:0000:9640:c9ff:fea9:e248&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; v2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; bond0&lt;/P&gt;
&lt;P&gt;mlx5_bond_0&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0000:0000:0000:0000:0000:ffff:ac1e:d954 172.30.217.84&amp;nbsp;&amp;nbsp; v1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; vlan50&lt;/P&gt;
&lt;P&gt;mlx5_bond_0&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0000:0000:0000:0000:0000:ffff:ac1e:d954 172.30.217.84&amp;nbsp;&amp;nbsp; v2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; vlan50&lt;/P&gt;
&lt;P&gt;n_gids_found=4&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I tried setting UCX_NET_DEVICES=vlan50 but that gives the same errors as when using UCX_NET_DEVICES=mlx5_bond_0:1. Also, not specifying UCX_NET_DEVICES at all yields the same error.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also, you requested, “please align UCX settings for IMPI and OpenMPI”. Which settings are you suggesting that I add or remove? This is what I am using with Intel MPI and UCX options:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;mpi_flags='-genv I_MPI_OFI_PROVIDER=mlx -genv I_MPI_FABRICS=shm:ofi -genv UCX_TLS=self,rc,dc,ud -genv UCX_NET_DEVICES=mlx5_bond_0:1 -genv UCX_RNDV_THRESH=2000000 -genv UCX_IB_SL=3 -genv UCX_IB_TRAFFIC_CLASS=106 -genv UCX_IB_GID_INDEX=3 -genv I_MPI_DEBUG=10 -genv FI_LOG_LEVEL=debug'&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This is what I am using with Open MPI:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; --mca pml ucx -x UCX_IB_SL=3 -x UCX_NET_DEVICES=mlx5_bond_0:1 -x UCX_IB_TRAFFIC_CLASS=106 -x UCX_IB_GID_INDEX=3 -x UCX_RNDV_THRESH=2000000&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note that UCX_TLS=self,rc,dc,ud was only added after receiving errors using I_MPI_OFI_PROVIDER=mlx with Intel MPI. The errors are the same regardless of whether that option is used or not.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Karen&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;Sent:&lt;/STRONG&gt; Wednesday, March 30, 2022 4:21 AM&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hi Karen,&lt;/P&gt;
&lt;P&gt;Could you please add FI_LOG_LEVEL=debug and provide a log for this run. That would allow to identify root cause of the problem with mlx provider.&lt;/P&gt;
&lt;P&gt;Additionally could you please align UCX settings for IMPI and OpenMPI?&lt;BR /&gt;Thanks in advance.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;gt; What are the proper options to use to allow UCX to be used with Intel MPI?&lt;/P&gt;
&lt;P&gt;In general, FI_PROVIDER=mlx or I_MPI_OFI_PROVIDER=mlx should be enough.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;-&lt;/P&gt;
&lt;P&gt;With best regards, Alex.&lt;/P&gt;</description>
      <pubDate>Tue, 05 Apr 2022 00:49:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1374172#M9384</guid>
      <dc:creator>KarenD</dc:creator>
      <dc:date>2022-04-05T00:49:05Z</dc:date>
    </item>
    <item>
      <title>Re:MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1384088#M9481</link>
      <description>&lt;P&gt;Hi Karen,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;1) can you try it using a newer libfabric? The one shipped by IMPI is 1.12.1 but the newest should be 1.15.0&lt;/P&gt;&lt;P&gt;&amp;nbsp;2) can you reproduce this issue in an Intel machine with the newest libfabric?&lt;/P&gt;&lt;P&gt;&amp;nbsp;3) if yes, can you show where the call hangs, e.g. by using gstack?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;Cheers,&lt;/P&gt;&lt;P&gt;&amp;nbsp;Rafael&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 13 May 2022 08:40:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1384088#M9481</guid>
      <dc:creator>Rafael_L_Intel</dc:creator>
      <dc:date>2022-05-13T08:40:04Z</dc:date>
    </item>
    <item>
      <title>Re: Re:MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1386399#M9507</link>
      <description>&lt;P&gt;According to my benchmark output, Intel MPI 2021.4 is using libfabric 1.13.0-impi and 2021.5 is using 1.13.rc1-impi. Where can I get the Intel version of 1.15? I tried downloading and building libfabric 1.15.1 from github and decided to first try it with Intel MPI 2021.4 because 2021.4 was already working with my test case. I modified&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000000"&gt;/opt/intel/oneapi/mpi/2021.4.0/env/vars.sh&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt; color: red;"&gt;&lt;FONT size="4"&gt;&amp;nbsp;&lt;FONT color="#000000"&gt;to point to point to the libfabric 1.15.1 directory but the test is getting a segfault, and says that provider mlx is unknown. The github version of 1.15.1 didn't have a libfabric/lib/prov directory, so I had just left it pointing to the original 2021.4 directory.&amp;nbsp; Here's what I changed in vars.sh&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt; color: red;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;case "$i_mpi_ofi_library_internal" in&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0|no|off|disable)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ;;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; *)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PATH="${I_MPI_ROOT}/${PLATFORM}/libfabric/bin:${PATH}"; export PATH&lt;/P&gt;
&lt;P&gt;#&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; LD_LIBRARY_PATH="${I_MPI_ROOT}/${PLATFORM}/libfabric/lib:${LD_LIBRARY_PATH}"; export LD_LIBRARY_PATH&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; LD_LIBRARY_PATH="&lt;FONT color="#FF0000"&gt;/root/libfabric/libfabric-1.15.1/src/.libs&lt;/FONT&gt;:${LD_LIBRARY_PATH}"; export LD_LIBRARY_PATH&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if [ -z "${LIBRARY_PATH}" ]&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; then&lt;/P&gt;
&lt;P&gt;#&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; LIBRARY_PATH="${I_MPI_ROOT}/${PLATFORM}/libfabric/lib"; export LIBRARY_PATH&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; LIBRARY_PATH="&lt;FONT color="#FF0000"&gt;/root/libfabric/libfabric-1.15.1/src/.libs&lt;/FONT&gt;"; export LIBRARY_PATH&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; else&lt;/P&gt;
&lt;P&gt;#&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; LIBRARY_PATH="${I_MPI_ROOT}/${PLATFORM}/libfabric/lib:${LIBRARY_PATH}"; export LIBRARY_PATH&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; LIBRARY_PATH="&lt;FONT color="#FF0000"&gt;/root/libfabric/libfabric-1.15.1/src/.libs&lt;/FONT&gt;:${LIBRARY_PATH}"; export LIBRARY_PATH&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; fi&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; FI_PROVIDER_PATH="${I_MPI_ROOT}/${PLATFORM}/libfabric/lib/prov:/usr/lib64/libfabric"; export FI_PROVIDER_PATH&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ;;&lt;/P&gt;
&lt;P&gt;esac&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt; color: red;"&gt;&lt;FONT color="#000000"&gt;The output from the test case when using libfabric 1.15.1 is attached. The output is showing:&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;[0] MPI startup(): libfabric version: &lt;FONT color="#FF0000"&gt;1.15.1&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;libfabric:56808:1653261977::core:core:verify_filter_names():562&amp;lt;warn&amp;gt; &lt;FONT color="#FF0000"&gt;provider mlx is unknown&lt;/FONT&gt;, misspelled or DL provider?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;[perfcomp3:56808:0:56808] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))&lt;/P&gt;
&lt;P&gt;==== backtrace (tid:&amp;nbsp; 56808) ====&lt;/P&gt;
&lt;P&gt;&amp;nbsp;0&amp;nbsp; /usr/lib64/libucs.so.0(ucs_handle_error+0xe4) [0x15548926e6d4]&lt;/P&gt;
&lt;P&gt;&amp;nbsp;1&amp;nbsp; /usr/lib64/libucs.so.0(+0x21a4c) [0x15548926ea4c]&lt;/P&gt;
&lt;P&gt;&amp;nbsp;2&amp;nbsp; /usr/lib64/libucs.so.0(+0x21c02) [0x15548926ec02]&lt;/P&gt;
&lt;P&gt;&amp;nbsp;3&amp;nbsp; /lib64/libpthread.so.0(+0x132d0) [0x1555534962d0]&lt;/P&gt;</description>
      <pubDate>Mon, 23 May 2022 00:04:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1386399#M9507</guid>
      <dc:creator>KarenD</dc:creator>
      <dc:date>2022-05-23T00:04:34Z</dc:date>
    </item>
    <item>
      <title>Re: Re:MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1386697#M9510</link>
      <description>&lt;P&gt;I found out that it's necessary to enable the providers when building libfabric. Following the directions found here:&lt;/P&gt;
&lt;P&gt;&lt;A href="https://github.com/ofiwg/libfabric" target="_blank"&gt;https://github.com/ofiwg/libfabric&lt;/A&gt;&lt;/P&gt;
&lt;PRE class="notranslate"&gt;&lt;CODE&gt;--enable-&amp;lt;provider&amp;gt;=[yes|no|auto|dl|&amp;lt;directory&lt;/CODE&gt;&amp;nbsp;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;Using ./configure --enable-verbs=dl builds the verbs provider, but the mlx provider does not exist (it isn't documented on the site listed above, and an error is received when attempting to use --enable-mlx=dl :&lt;/P&gt;
&lt;P&gt;erfcomp3:~/libfabric/libfabric-1.15.1 # &lt;FONT color="#FF0000"&gt;./configure --enable-mlx=dl --enable-verbs=dl&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;configure: WARNING: &lt;STRONG&gt;unrecognized options: --enable-mlx&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;***&lt;/P&gt;
&lt;P&gt;*** Built-in providers: opx dmabuf_peer_mem hook_hmem hook_debug perf rstream shm rxd mrail rxm tcp udp sockets psm2&lt;/P&gt;
&lt;P&gt;*** DSO providers:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; verbs&lt;/P&gt;
&lt;P&gt;Where can I get a copy of libfabric 1.15 that has the mlx provider?&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Karen&lt;/P&gt;</description>
      <pubDate>Mon, 23 May 2022 20:27:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1386697#M9510</guid>
      <dc:creator>KarenD</dc:creator>
      <dc:date>2022-05-23T20:27:40Z</dc:date>
    </item>
    <item>
      <title>Re:MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1386914#M9511</link>
      <description>&lt;P&gt;Hi Karen!&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;That is correct, mlx provider is available in our build of libfabrics and is not present in the opensource repository. Can you please test with your original settings i.e. I_MPI_OFI_PROVIDER=verbs, or unset?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Cheers!&lt;/P&gt;&lt;P&gt;Rafael&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 24 May 2022 08:50:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1386914#M9511</guid>
      <dc:creator>Rafael_L_Intel</dc:creator>
      <dc:date>2022-05-24T08:50:07Z</dc:date>
    </item>
    <item>
      <title>Re: MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1387150#M9514</link>
      <description>&lt;P&gt;I was able to get the test case to work with the open source version of libfabric 1.15.1 by either&lt;/P&gt;
&lt;P&gt;1) specifying the verbs provider with FI_PROVIDER=verbs&lt;/P&gt;
&lt;P&gt;2) not explicitly specifying a provider (in which case the verbs provider was used).&lt;/P&gt;
&lt;P&gt;Note that using I_MPI_OFI_PROVIDER=verbs didn't work with the open source libfabric. It throws the error "MPIDI_OFI_mpi_init_hook:No data available"&lt;/P&gt;
&lt;P&gt;So it appears that the initial problem of the test case hanging with the verbs provider is related to the use of libfabric 1.13 that ships with Intel MPI.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Note that even though the test case works with the verbs provider and libfabric 1.15.1, the performance is very poor as compared to using the mlx provider with Intel MPI 2021.4 and libfabric 1.13.&lt;/STRONG&gt;&amp;nbsp; For the latter combination, the test case with a message size of 1 MB had a latency of 325 us, while the same test case with the verbs provider and libfabric 1.15.1 has a latency that was 3820 us,&amp;nbsp; over a factor of 10 longer than with the mlx provider (see output below).&lt;/P&gt;
&lt;P&gt;Does Intel have plans to update Intel MPI to use libfabric 1.15.1? Will the mlx provider be included?&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Karen&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Here is the output of the test case when using Intel MPI 2021.4 and the mlx provider. Note that latency is 325us.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;perfcomp3:/var/tmp/paulo/mpi_tests # &lt;FONT color="#FF0000"&gt;sh launch_mpi_isend_recv_v4_2ppn_BM.sh --repetitions 1 --size 1000000 2&amp;gt;&amp;amp;1 | tee launch_mpi_isend_recv_v4_2ppn_BM_2021.4.log&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;[0] MPI startup():&lt;FONT color="#FF0000"&gt; libfabric version: 1.13.0-impi&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;&lt;FONT color="#FF0000"&gt;&lt;FONT color="#000000"&gt;[0] MPI startup():&lt;/FONT&gt; libfabric provider: mlx&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;[0] MPI startup(): &lt;FONT color="#FF0000"&gt;I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.4.0&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;[0] MPI startup(): I_MPI_MPIRUN=mpirun&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;[0] MPI startup(): I_MPI_FABRICS=shm:ofi&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;[0] MPI startup(): &lt;FONT color="#FF0000"&gt;I_MPI_OFI_PROVIDER=mlx&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;[0] MPI startup(): I_MPI_DEBUG=10&lt;/P&gt;
&lt;P&gt;nrep = 1&lt;/P&gt;
&lt;P&gt;sendcount = 1000000&lt;/P&gt;
&lt;P&gt;recvcount = 1000000&lt;/P&gt;
&lt;P&gt;wtime = 0.001&lt;/P&gt;
&lt;P&gt;Bandwidth per receiver process = 3080.514 MB/s&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;MPI_Recv latency = 324.621 us&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Here is the output of the test case when using Intel MPI 2021.4 with libfabric 1.15.1 and not explicitly specifying a provider, which results in the verbs provider being used. Note that latency is 3821us, an increase of over a factor of 10.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;mpi_flags='-genv I_MPI_DEBUG=10'&lt;/P&gt;
&lt;P&gt;perfcomp3:/var/tmp/paulo/mpi_tests # &lt;FONT color="#FF0000"&gt;sh launch_mpi_isend_recv_v4_2ppn_BM_libfabric1.15.1.sh --repetitions 1 --size 1000000 | tee launch_mpi_isend_recv_v4_2ppn_BM_libfabric1.15.1_noprovider.log&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;[0] MPI startup(): &lt;FONT color="#FF0000"&gt;libfabric version: 1.15.1&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;[0] MPI startup(): &lt;FONT color="#FF0000"&gt;libfabric provider: verbs;ofi_rxm&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;[0] MPI startup(): &lt;FONT color="#FF0000"&gt;I_MPI_ROOT=/opt/intel/oneapi/mpi/2021.4.0&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;[0] MPI startup(): I_MPI_MPIRUN=mpirun&lt;/P&gt;
&lt;P&gt;[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc&lt;/P&gt;
&lt;P&gt;[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default&lt;/P&gt;
&lt;P&gt;[0] MPI startup(): I_MPI_DEBUG=10&lt;/P&gt;
&lt;P&gt;nrep = 1&lt;/P&gt;
&lt;P&gt;sendcount = 1000000&lt;/P&gt;
&lt;P&gt;recvcount = 1000000&lt;/P&gt;
&lt;P&gt;wtime = 0.011&lt;/P&gt;
&lt;P&gt;Bandwidth per receiver process = 261.718 MB/s&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;MPI_Recv latency = 3820.901 us&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 24 May 2022 22:59:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1387150#M9514</guid>
      <dc:creator>KarenD</dc:creator>
      <dc:date>2022-05-24T22:59:42Z</dc:date>
    </item>
    <item>
      <title>Re: MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1388186#M9533</link>
      <description>&lt;P&gt;I had been running a single iteration of the mpi_isend_recv test case for debugging purposes, but recalled that running with more iterations (e.g. 100 iterations) significantly reduces the latency reported. With 100 iterations, using the verbs provider with libfabric 1.15.1 actually gives lower latency than the mlx provider with libfabric 1.13. In both cases Intel MPI 2021.4 was used, but it's not strictly an apples-to-apples comparison because different versions of libfabric are used. But I don't have a version of libfabric 1.15.1 with the mlx provider, and the test hangs with the verbs provider and libfabric 1.13.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Here are the results with the verbs provider and libfabric 1.15.1:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;perfcomp3:/var/tmp/paulo/mpi_tests # &lt;FONT color="#FF0000"&gt;sh launch_mpi_isend_recv_v4_2ppn_BM_libfabric1.15.1.sh --repetitions 100 --size 1000000&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;[0] MPI startup(): Intel(R) MPI Library, Version 2021.4&amp;nbsp; Build 20210831 (id: 758087adf)&lt;/P&gt;
&lt;P&gt;[0] MPI startup(): &lt;FONT color="#FF0000"&gt;libfabric version: 1.15.1&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;[0] MPI startup(): &lt;FONT color="#FF0000"&gt;libfabric provider: verbs;ofi_rxm&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;nrep = 100&lt;/P&gt;
&lt;P&gt;sendcount = 1000000&lt;/P&gt;
&lt;P&gt;recvcount = 1000000&lt;/P&gt;
&lt;P&gt;wtime = 0.031&lt;/P&gt;
&lt;P&gt;Bandwidth per receiver process = 9744.114 MB/s&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;MPI_Recv latency = 102.626 us&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;&lt;FONT size="3"&gt;&lt;STRONG&gt;Here are the results with the mlx provider and libfabric 1.13.0:&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;perfcomp3:/var/tmp/paulo/mpi_tests # &lt;FONT color="#FF0000"&gt;sh launch_mpi_isend_recv_v4_2ppn_BM.sh --repetitions 100 --size 1000000&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;Intel(R) MPI Library for Linux* OS, Version 2021.4 Build 20210831 (id: 758087adf)&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;[0] MPI startup(): &lt;FONT color="#FF0000"&gt;libfabric version: 1.13.0-impi&lt;/FONT&gt;&lt;BR /&gt;[0] MPI startup(): &lt;FONT color="#FF0000"&gt;libfabric provider: mlx&lt;/FONT&gt;&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;nrep = 100&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;sendcount = 1000000&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;recvcount = 1000000&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;wtime = 0.047&lt;/P&gt;
&lt;P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"&gt;Bandwidth per receiver process = 6403.591 MB/s&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;MPI_Recv latency = 156.162 us&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 27 May 2022 19:25:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1388186#M9533</guid>
      <dc:creator>KarenD</dc:creator>
      <dc:date>2022-05-27T19:25:32Z</dc:date>
    </item>
    <item>
      <title>Re: MPI Isend/Recv with Waitall using RoCE protocol hangs with large message size</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1389364#M9547</link>
      <description>&lt;P&gt;Good news: we noticed that Intel MPI 2021.6 is now available and the mpi_isend_recv test case runs successfully for both the verbs and the mlx provider, and both get similar performance.&amp;nbsp; Interestingly, 2021.6 ships with the same version of libfabric as 2021.5.1 (libfabric 1.13.2rc1-impi), so it's not clear that libfabric was the cause of the segfaults with Intel MPI 2021.5.1.&amp;nbsp; Here's a summary of which combinations of Intel MPI, libfabric, and providers worked and which ones failed:&lt;/P&gt;
&lt;TABLE border="1" width="100%"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD width="33.333333333333336%" height="25px"&gt;Intel MPI version&lt;/TD&gt;
&lt;TD width="33.333333333333336%" height="25px"&gt;libfabric version&lt;/TD&gt;
&lt;TD width="16.666666666666668%" height="25px"&gt;provider&lt;/TD&gt;
&lt;TD width="16.666666666666668%" height="25px"&gt;result&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="33.333333333333336%" height="25px"&gt;2021.4.0&lt;/TD&gt;
&lt;TD width="33.333333333333336%" height="25px"&gt;1.13.0-impi&lt;/TD&gt;
&lt;TD width="16.666666666666668%" height="25px"&gt;&lt;FONT color="#00FF00"&gt;mlx&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD width="16.666666666666668%" height="25px"&gt;&lt;FONT color="#00FF00"&gt;runs&lt;/FONT&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD height="25px"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD height="25px"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD height="25px"&gt;&lt;FONT color="#FF0000"&gt;verbs&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD height="25px"&gt;&lt;FONT color="#FF0000"&gt;hangs&lt;/FONT&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="33.333333333333336%" height="25px"&gt;2021.5.1&lt;/TD&gt;
&lt;TD width="33.333333333333336%" height="25px"&gt;1.13.2.rc1-impi&lt;/TD&gt;
&lt;TD width="16.666666666666668%" height="25px"&gt;&lt;FONT color="#FF0000"&gt;mlx&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD width="16.666666666666668%" height="25px"&gt;&lt;FONT color="#FF0000"&gt;seg fault&lt;/FONT&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD height="25px"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD height="25px"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD height="25px"&gt;&lt;FONT color="#FF0000"&gt;verbs&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD height="25px"&gt;&lt;FONT color="#FF0000"&gt;segfault&lt;/FONT&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD width="33.333333333333336%" height="25px"&gt;2021.6.0&lt;/TD&gt;
&lt;TD width="33.333333333333336%" height="25px"&gt;&amp;nbsp;1.13.2.rc1-impi&lt;/TD&gt;
&lt;TD width="16.666666666666668%" height="25px"&gt;&lt;FONT color="#00FF00"&gt;mlx&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD width="16.666666666666668%" height="25px"&gt;&lt;FONT color="#00FF00"&gt;runs&lt;/FONT&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD height="25px"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD height="25px"&gt;&amp;nbsp;&lt;/TD&gt;
&lt;TD height="25px"&gt;&lt;FONT color="#00FF00"&gt;verbs&lt;/FONT&gt;&lt;/TD&gt;
&lt;TD height="25px"&gt;&lt;FONT color="#00FF00"&gt;runs&lt;/FONT&gt;&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;With 100 iterations of the test, performance (latency) is similar for the mlx and verbs provider with Intel MPI 2021.6:&lt;/P&gt;
&lt;P&gt;perfcomp3:/var/tmp/paulo/mpi_tests #&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;FONT color="#FF0000"&gt;sh launch_mpi_isend_recv_v4_2ppn_BM.sh --repetitions 1 --size 1000000&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="error"&gt;[0]&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;MPI startup():&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;FONT color="#FF0000"&gt;Intel(R) MPI Library, Version 2021.6&lt;/FONT&gt;&amp;nbsp; Build 20220227 (id: 28877f3f32)&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="error"&gt;[0]&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;MPI startup(): libfabric version: 1.13.2rc1-impi&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="error"&gt;[0]&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="error"&gt;[0]&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;MPI startup():&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;FONT color="#FF0000"&gt;libfabric provider: mlx&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;nrep = 100&lt;/P&gt;
&lt;P&gt;sendcount = 1000000&lt;/P&gt;
&lt;P&gt;recvcount = 1000000&lt;/P&gt;
&lt;P&gt;wtime = 0.039&lt;/P&gt;
&lt;P&gt;Bandwidth per receiver process = 7623.593 MB/s&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#FF0000"&gt;&lt;STRONG&gt;MPI_Recv latency = 131.172 us&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;perfcomp3:/var/tmp/paulo/mpi_tests #&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;FONT color="#FF0000"&gt;sh launch_mpi_isend_recv_v4_2ppn_BM.sh --repetitions 100 --size 1000000&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#FF0000"&gt;Intel(R) MPI Library for Linux* OS, Version 2021.6&lt;/FONT&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;Build 20220227 (id: 28877f3f32)&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="error"&gt;[0]&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;MPI startup(): libfabric version: 1.13.2rc1-impi&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="error"&gt;[0]&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)&lt;/P&gt;
&lt;P&gt;&lt;SPAN class="error"&gt;[0]&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;MPI startup():&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;FONT color="#FF0000"&gt;libfabric provider:&amp;nbsp;verbs;ofi_rxm&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;nrep = 100&lt;BR /&gt;sendcount = 1000000&lt;BR /&gt;recvcount = 1000000&lt;BR /&gt;wtime = 0.040&lt;BR /&gt;Bandwidth per receiver process = 7443.520 MB/s&lt;BR /&gt;&lt;FONT color="#FF0000"&gt;&lt;STRONG&gt;MPI_Recv latency = 134.345 us&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Jun 2022 21:14:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Isend-Recv-with-Waitall-using-RoCE-protocol-hangs-with-large/m-p/1389364#M9547</guid>
      <dc:creator>KarenD</dc:creator>
      <dc:date>2022-06-01T21:14:58Z</dc:date>
    </item>
  </channel>
</rss>

