<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Re:MPI_IRECV sporadically hangs for Intel 2019/2020 but not Intel 2018 in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1228104#M7328</link>
    <description>&lt;P&gt;Prasanth,&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;1. Thank you for looking at this.&amp;nbsp; I just want to confirm whether or not you ran the two mpi processes on a single physical node or two distinct physical nodes (one process per node).&amp;nbsp; The hang does not occur for us&amp;nbsp; you run the two mpi processes on a single physical node. It requires two distinct nodes (which I guess prevents shared memory communication).&lt;/P&gt;
&lt;P&gt;2. I tried to set the FI_PROVIDER=mlx, but the code crashes at startup.&amp;nbsp; The output I see is below (also attached).&amp;nbsp; Does this indicate some issue in our cluster setup?&lt;/P&gt;
&lt;P&gt;John&lt;/P&gt;
&lt;P&gt;[0] MPI startup(): Intel(R) MPI Library, Version 2019 Update 9 Build 20200923 (id: abd58e492)&lt;BR /&gt;[0] MPI startup(): Copyright (C) 2003-2020 Intel Corporation. All rights reserved.&lt;BR /&gt;[0] MPI startup(): library kind: release&lt;BR /&gt;[0] MPI startup(): libfabric version: 1.10.1-impi&lt;BR /&gt;libfabric:107610:core:mr:ofi_default_cache_size():56&amp;lt;info&amp;gt; default cache size=2109042048&lt;BR /&gt;libfabric:22236:core:mr:ofi_default_cache_size():56&amp;lt;info&amp;gt; default cache size=2109042048&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: sockets (110.10)&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():446&amp;lt;info&amp;gt; "sockets" filtered by provider include/exclude list, skipping&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: sockets (110.10)&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():446&amp;lt;info&amp;gt; "sockets" filtered by provider include/exclude list, skipping&lt;BR /&gt;libfabric:107610:core:core:ofi_reg_dl_prov():578&amp;lt;warn&amp;gt; dlopen(/opt/ohpc/pub/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so): libpsm2.so.2: cannot open shared object file: No such file or directory&lt;BR /&gt;libfabric:22236:core:core:ofi_reg_dl_prov():578&amp;lt;warn&amp;gt; dlopen(/opt/ohpc/pub/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so): libpsm2.so.2: cannot open shared object file: No such file or directory&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: ofi_rxm (110.10)&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: ofi_rxm (110.10)&lt;BR /&gt;libfabric:107610:core:core:ofi_reg_dl_prov():578&amp;lt;warn&amp;gt; dlopen(/opt/ohpc/pub/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/libfabric/lib/prov/libefa-fi.so): libefa.so.1: cannot open shared object file: No such file or directory&lt;BR /&gt;libfabric:22236:core:core:ofi_reg_dl_prov():578&amp;lt;warn&amp;gt; dlopen(/opt/ohpc/pub/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/libfabric/lib/prov/libefa-fi.so): libefa.so.1: cannot open shared object file: No such file or directory&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: tcp (110.10)&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():446&amp;lt;info&amp;gt; "tcp" filtered by provider include/exclude list, skipping&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: tcp (110.10)&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():446&amp;lt;info&amp;gt; "tcp" filtered by provider include/exclude list, skipping&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: shm (110.10)&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():446&amp;lt;info&amp;gt; "shm" filtered by provider include/exclude list, skipping&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: shm (110.10)&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():446&amp;lt;info&amp;gt; "shm" filtered by provider include/exclude list, skipping&lt;BR /&gt;libfabric:107610:core:mr:ofi_default_cache_size():56&amp;lt;info&amp;gt; default cache size=2109042048&lt;BR /&gt;libfabric:107610:verbs:fabric:verbs_devs_print():869&amp;lt;info&amp;gt; list of verbs devices found for FI_EP_MSG:&lt;BR /&gt;libfabric:107610:verbs:fabric:verbs_devs_print():873&amp;lt;info&amp;gt; #1 mlx4_0 - IPoIB addresses:&lt;BR /&gt;libfabric:107610:verbs:fabric:verbs_devs_print():883&amp;lt;info&amp;gt; 10.30.18.103&lt;BR /&gt;libfabric:107610:verbs:fabric:verbs_devs_print():883&amp;lt;info&amp;gt; fe80::202:c903:14:de71&lt;BR /&gt;libfabric:107610:verbs:fabric:vrb_get_device_attrs():615&amp;lt;info&amp;gt; device mlx4_0: first found active port is 1&lt;BR /&gt;libfabric:107610:verbs:fabric:vrb_get_device_attrs():615&amp;lt;info&amp;gt; device mlx4_0: first found active port is 1&lt;BR /&gt;libfabric:107610:verbs:fabric:vrb_get_device_attrs():615&amp;lt;info&amp;gt; device mlx4_0: first found active port is 1&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: verbs (110.10)&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():446&amp;lt;info&amp;gt; "verbs" filtered by provider include/exclude list, skipping&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: mlx (1.4)&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: ofi_hook_noop (110.10)&lt;BR /&gt;libfabric:107610:core:core:fi_getinfo_():1092&amp;lt;info&amp;gt; Found provider with the highest priority mlx, must_use_util_prov = 0&lt;BR /&gt;libfabric:107610:mlx:core:mlx_getinfo():172&amp;lt;info&amp;gt; used inject size = 1024 &lt;BR /&gt;libfabric:107610:mlx:core:mlx_getinfo():219&amp;lt;info&amp;gt; Loaded MLX version 1.6.0&lt;BR /&gt;libfabric:107610:mlx:core:mlx_getinfo():266&amp;lt;warn&amp;gt; MLX: spawn support 0 &lt;BR /&gt;libfabric:107610:core:core:fi_getinfo_():1092&amp;lt;info&amp;gt; Found provider with the highest priority mlx, must_use_util_prov = 0&lt;BR /&gt;libfabric:107610:mlx:core:mlx_getinfo():172&amp;lt;info&amp;gt; used inject size = 1024 &lt;BR /&gt;libfabric:107610:mlx:core:mlx_getinfo():219&amp;lt;info&amp;gt; Loaded MLX version 1.6.0&lt;BR /&gt;libfabric:107610:mlx:core:mlx_getinfo():266&amp;lt;warn&amp;gt; MLX: spawn support 0 &lt;BR /&gt;[0] MPI startup(): libfabric provider: mlx&lt;BR /&gt;libfabric:107610:mlx:core:mlx_fabric_open():172&amp;lt;info&amp;gt; &lt;BR /&gt;libfabric:107610:core:core:fi_fabric_():1372&amp;lt;info&amp;gt; Opened fabric: mlx&lt;BR /&gt;libfabric:107610:mlx:core:ofi_check_rx_attr():782&amp;lt;info&amp;gt; Tx only caps ignored in Rx caps&lt;BR /&gt;libfabric:107610:mlx:core:ofi_check_tx_attr():880&amp;lt;info&amp;gt; Rx only caps ignored in Tx caps&lt;BR /&gt;[0] MPI startup(): max_ch4_vcis: 1, max_reg_eps 1, enable_sep 0, enable_shared_ctxs 0, do_av_insert 1&lt;BR /&gt;libfabric:107610:mlx:core:ofi_check_rx_attr():782&amp;lt;info&amp;gt; Tx only caps ignored in Rx caps&lt;BR /&gt;libfabric:107610:mlx:core:ofi_check_tx_attr():880&amp;lt;info&amp;gt; Rx only caps ignored in Tx caps&lt;BR /&gt;[0] MPI startup(): addrnamelen: 1024&lt;BR /&gt;libfabric:107610:mlx:core:mlx_cm_getname_mlx_format():73&amp;lt;info&amp;gt; Loaded UCP address: [127]...&lt;BR /&gt;libfabric:22236:core:mr:ofi_default_cache_size():56&amp;lt;info&amp;gt; default cache size=2109042048&lt;BR /&gt;libfabric:22236:verbs:fabric:verbs_devs_print():869&amp;lt;info&amp;gt; list of verbs devices found for FI_EP_MSG:&lt;BR /&gt;libfabric:22236:verbs:fabric:verbs_devs_print():873&amp;lt;info&amp;gt; #1 mlx4_0 - IPoIB addresses:&lt;BR /&gt;libfabric:22236:verbs:fabric:verbs_devs_print():883&amp;lt;info&amp;gt; 10.30.18.104&lt;BR /&gt;libfabric:22236:verbs:fabric:verbs_devs_print():883&amp;lt;info&amp;gt; fe80::202:c903:14:ddf1&lt;BR /&gt;libfabric:22236:verbs:fabric:vrb_get_device_attrs():615&amp;lt;info&amp;gt; device mlx4_0: first found active port is 1&lt;BR /&gt;libfabric:22236:verbs:fabric:vrb_get_device_attrs():615&amp;lt;info&amp;gt; device mlx4_0: first found active port is 1&lt;BR /&gt;libfabric:22236:verbs:fabric:vrb_get_device_attrs():615&amp;lt;info&amp;gt; device mlx4_0: first found active port is 1&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: verbs (110.10)&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():446&amp;lt;info&amp;gt; "verbs" filtered by provider include/exclude list, skipping&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: mlx (1.4)&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: ofi_hook_noop (110.10)&lt;BR /&gt;libfabric:22236:core:core:fi_getinfo_():1092&amp;lt;info&amp;gt; Found provider with the highest priority mlx, must_use_util_prov = 0&lt;BR /&gt;libfabric:22236:mlx:core:mlx_getinfo():172&amp;lt;info&amp;gt; used inject size = 1024 &lt;BR /&gt;libfabric:22236:mlx:core:mlx_getinfo():219&amp;lt;info&amp;gt; Loaded MLX version 1.6.0&lt;BR /&gt;libfabric:22236:mlx:core:mlx_getinfo():266&amp;lt;warn&amp;gt; MLX: spawn support 0 &lt;BR /&gt;libfabric:22236:core:core:fi_getinfo_():1092&amp;lt;info&amp;gt; Found provider with the highest priority mlx, must_use_util_prov = 0&lt;BR /&gt;libfabric:22236:mlx:core:mlx_getinfo():172&amp;lt;info&amp;gt; used inject size = 1024 &lt;BR /&gt;libfabric:22236:mlx:core:mlx_getinfo():219&amp;lt;info&amp;gt; Loaded MLX version 1.6.0&lt;BR /&gt;libfabric:22236:mlx:core:mlx_getinfo():266&amp;lt;warn&amp;gt; MLX: spawn support 0 &lt;BR /&gt;libfabric:22236:mlx:core:mlx_fabric_open():172&amp;lt;info&amp;gt; &lt;BR /&gt;libfabric:22236:core:core:fi_fabric_():1372&amp;lt;info&amp;gt; Opened fabric: mlx&lt;BR /&gt;libfabric:22236:mlx:core:ofi_check_rx_attr():782&amp;lt;info&amp;gt; Tx only caps ignored in Rx caps&lt;BR /&gt;libfabric:22236:mlx:core:ofi_check_tx_attr():880&amp;lt;info&amp;gt; Rx only caps ignored in Tx caps&lt;BR /&gt;libfabric:22236:mlx:core:ofi_check_rx_attr():782&amp;lt;info&amp;gt; Tx only caps ignored in Rx caps&lt;BR /&gt;libfabric:22236:mlx:core:ofi_check_tx_attr():880&amp;lt;info&amp;gt; Rx only caps ignored in Tx caps&lt;BR /&gt;libfabric:22236:mlx:core:mlx_cm_getname_mlx_format():73&amp;lt;info&amp;gt; Loaded UCP address: [127]...&lt;BR /&gt;libfabric:22236:mlx:core:mlx_av_insert():179&amp;lt;warn&amp;gt; Try to insert address #0, offset=0 (size=2) fi_addr=0x7f2000132a00 &lt;BR /&gt;[1605269643.201248] [cnode003:22236:0] select.c:410 UCX ERROR no active messages transport to &amp;lt;no debug data&amp;gt;: mm/posix - Destination is unreachable, mm/sysv - Destination is unreachable, self/self - Destination is unreachable&lt;BR /&gt;Abort(1091215) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:&lt;BR /&gt;MPIR_Init_thread(136)........: &lt;BR /&gt;MPID_Init(1149)..............: &lt;BR /&gt;MPIDI_OFI_mpi_init_hook(1657): OFI get address vector map failed&lt;BR /&gt;[1605269643.201281] [cnode002:107610:0] select.c:410 UCX ERROR no active messages transport to &amp;lt;no debug data&amp;gt;: mm/posix - Destination is unreachable, mm/sysv - Destination is unreachable, self/self - Destination is unreachable&lt;BR /&gt;libfabric:107610:mlx:core:mlx_av_insert():179&amp;lt;warn&amp;gt; Try to insert address #0, offset=0 (size=2) fi_addr=0x7f200002cb80 &lt;BR /&gt;libfabric:107610:mlx:core:mlx_av_insert():189&amp;lt;warn&amp;gt; address inserted&lt;BR /&gt;libfabric:107610:mlx:core:mlx_av_insert():179&amp;lt;warn&amp;gt; Try to insert address #1, offset=1024 (size=2) fi_addr=0x7f200002cb80 &lt;BR /&gt;Abort(1091215) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:&lt;BR /&gt;MPIR_Init_thread(136)........: &lt;BR /&gt;MPID_Init(1149)..............: &lt;BR /&gt;MPIDI_OFI_mpi_init_hook(1657): OFI get address vector map failed&lt;/P&gt;</description>
    <pubDate>Fri, 13 Nov 2020 12:21:35 GMT</pubDate>
    <dc:creator>John_Young</dc:creator>
    <dc:date>2020-11-13T12:21:35Z</dc:date>
    <item>
      <title>MPI_IRECV sporadically hangs for Intel 2019/2020 but not Intel 2018</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1226324#M7312</link>
      <description>&lt;P&gt;We are experiencing strange behavior where mpi_irecv calls sometimes hang for Intel 2019 and Intel 2020 but not Intel 2018.&amp;nbsp; The issue seems to be related to the fabric.&amp;nbsp; For Intel 2018 we could use the DAPL or OFA fabric but with Intel 2019/2020 these fabrics were removed and you need to use OFI.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I've attached a small test case that exhibits the problem on our Linux cluster.&amp;nbsp; The test case is for 2 MPI processes.&amp;nbsp; The issue only occurs if the 2 MPI processes are on two distinct physical nodes in the cluster.&amp;nbsp; If you assign the 2 MPI processes to a single physical node, the hang does not occur.&amp;nbsp; The run.sh script drives the test cases, and you can select different Intel versions.&amp;nbsp; I've attached the output we see on our cluster in the screen*.txt files for the different Intel versions.&lt;/P&gt;
&lt;P&gt;We've scoured the code and it seems to be correct.&amp;nbsp; With Intel 2018, our production code runs flawlessly on Intel 2018 over a wide-range of problems and number of MPI processes/cluster nodes, but quite a few of these problems hang with Intel 2019/2020.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We know that Intel MPI 2019 had a lot of changes from 2018, so we are&amp;nbsp;wondering if there is some default setting that changed, e.g., MPI buffer sizes, that might be the cause of the problem.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;John&lt;/P&gt;</description>
      <pubDate>Sat, 07 Nov 2020 21:00:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1226324#M7312</guid>
      <dc:creator>John_Young</dc:creator>
      <dc:date>2020-11-07T21:00:41Z</dc:date>
    </item>
    <item>
      <title>Re: MPI_IRECV sporadically hangs for Intel 2019/2020 but not Intel 2018</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1227879#M7322</link>
      <description>&lt;P&gt;John,&lt;/P&gt;
&lt;P&gt;I am seeing similar problems in the 2020 MPI libraries when executing on multiple physical nodes (but not on a single physical node).&amp;nbsp; I have not been able to find a solution.&lt;/P&gt;
&lt;P&gt;Rob&lt;/P&gt;</description>
      <pubDate>Thu, 12 Nov 2020 19:17:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1227879#M7322</guid>
      <dc:creator>Robert_Adams</dc:creator>
      <dc:date>2020-11-12T19:17:29Z</dc:date>
    </item>
    <item>
      <title>Re: MPI_IRECV sporadically hangs for Intel 2019/2020 but not Intel 2018</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1227885#M7323</link>
      <description>&lt;P&gt;Has someone from Intel been able to take a look a this yet?&amp;nbsp; This is currently a showstopper bug for our code.&amp;nbsp; We have confirmed it on our cluster as well as one of our customer's cluster (with Intel 2020).&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;John&lt;/P&gt;</description>
      <pubDate>Thu, 12 Nov 2020 19:26:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1227885#M7323</guid>
      <dc:creator>John_Young</dc:creator>
      <dc:date>2020-11-12T19:26:22Z</dc:date>
    </item>
    <item>
      <title>Re:MPI_IRECV sporadically hangs for Intel 2019/2020 but not Intel 2018</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1228079#M7325</link>
      <description>&lt;P&gt;Hi John,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;We have tested your code and ran it several times in our environment with both 2019 and 2020 versions. &lt;/P&gt;&lt;P&gt;The only difference I have made is changing the provider from verbs to mlx. Since 2019u5 version mlx is recommended over verbs on InfiniBand. &lt;/P&gt;&lt;P&gt;To change the provider set &lt;B&gt;&lt;I&gt;FI_PROVIDER=mlx&lt;/I&gt;&lt;/B&gt;, else if you don't provide any value it will automatically select mlx in latest versions.&lt;/P&gt;&lt;P&gt;I ran it over fifty times and found no lag using below command:&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;I&gt;for run in {1..50}; do mpirun -env I_MPI_PIN_DOMAIN auto -env I_MPI_FABRICS=shm:ofi -f hosts&amp;nbsp;-n 2 -ppn 1 ./a.out ;done&lt;/I&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Please check with mlx meanwhile, we will get back to you after further investigation.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Prasanth&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 13 Nov 2020 10:43:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1228079#M7325</guid>
      <dc:creator>PrasanthD_intel</dc:creator>
      <dc:date>2020-11-13T10:43:26Z</dc:date>
    </item>
    <item>
      <title>Re: Re:MPI_IRECV sporadically hangs for Intel 2019/2020 but not Intel 2018</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1228104#M7328</link>
      <description>&lt;P&gt;Prasanth,&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;1. Thank you for looking at this.&amp;nbsp; I just want to confirm whether or not you ran the two mpi processes on a single physical node or two distinct physical nodes (one process per node).&amp;nbsp; The hang does not occur for us&amp;nbsp; you run the two mpi processes on a single physical node. It requires two distinct nodes (which I guess prevents shared memory communication).&lt;/P&gt;
&lt;P&gt;2. I tried to set the FI_PROVIDER=mlx, but the code crashes at startup.&amp;nbsp; The output I see is below (also attached).&amp;nbsp; Does this indicate some issue in our cluster setup?&lt;/P&gt;
&lt;P&gt;John&lt;/P&gt;
&lt;P&gt;[0] MPI startup(): Intel(R) MPI Library, Version 2019 Update 9 Build 20200923 (id: abd58e492)&lt;BR /&gt;[0] MPI startup(): Copyright (C) 2003-2020 Intel Corporation. All rights reserved.&lt;BR /&gt;[0] MPI startup(): library kind: release&lt;BR /&gt;[0] MPI startup(): libfabric version: 1.10.1-impi&lt;BR /&gt;libfabric:107610:core:mr:ofi_default_cache_size():56&amp;lt;info&amp;gt; default cache size=2109042048&lt;BR /&gt;libfabric:22236:core:mr:ofi_default_cache_size():56&amp;lt;info&amp;gt; default cache size=2109042048&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: sockets (110.10)&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():446&amp;lt;info&amp;gt; "sockets" filtered by provider include/exclude list, skipping&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: sockets (110.10)&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():446&amp;lt;info&amp;gt; "sockets" filtered by provider include/exclude list, skipping&lt;BR /&gt;libfabric:107610:core:core:ofi_reg_dl_prov():578&amp;lt;warn&amp;gt; dlopen(/opt/ohpc/pub/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so): libpsm2.so.2: cannot open shared object file: No such file or directory&lt;BR /&gt;libfabric:22236:core:core:ofi_reg_dl_prov():578&amp;lt;warn&amp;gt; dlopen(/opt/ohpc/pub/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/libfabric/lib/prov/libpsmx2-fi.so): libpsm2.so.2: cannot open shared object file: No such file or directory&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: ofi_rxm (110.10)&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: ofi_rxm (110.10)&lt;BR /&gt;libfabric:107610:core:core:ofi_reg_dl_prov():578&amp;lt;warn&amp;gt; dlopen(/opt/ohpc/pub/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/libfabric/lib/prov/libefa-fi.so): libefa.so.1: cannot open shared object file: No such file or directory&lt;BR /&gt;libfabric:22236:core:core:ofi_reg_dl_prov():578&amp;lt;warn&amp;gt; dlopen(/opt/ohpc/pub/intel/compilers_and_libraries_2020.4.304/linux/mpi/intel64/libfabric/lib/prov/libefa-fi.so): libefa.so.1: cannot open shared object file: No such file or directory&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: tcp (110.10)&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():446&amp;lt;info&amp;gt; "tcp" filtered by provider include/exclude list, skipping&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: tcp (110.10)&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():446&amp;lt;info&amp;gt; "tcp" filtered by provider include/exclude list, skipping&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: shm (110.10)&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():446&amp;lt;info&amp;gt; "shm" filtered by provider include/exclude list, skipping&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: shm (110.10)&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():446&amp;lt;info&amp;gt; "shm" filtered by provider include/exclude list, skipping&lt;BR /&gt;libfabric:107610:core:mr:ofi_default_cache_size():56&amp;lt;info&amp;gt; default cache size=2109042048&lt;BR /&gt;libfabric:107610:verbs:fabric:verbs_devs_print():869&amp;lt;info&amp;gt; list of verbs devices found for FI_EP_MSG:&lt;BR /&gt;libfabric:107610:verbs:fabric:verbs_devs_print():873&amp;lt;info&amp;gt; #1 mlx4_0 - IPoIB addresses:&lt;BR /&gt;libfabric:107610:verbs:fabric:verbs_devs_print():883&amp;lt;info&amp;gt; 10.30.18.103&lt;BR /&gt;libfabric:107610:verbs:fabric:verbs_devs_print():883&amp;lt;info&amp;gt; fe80::202:c903:14:de71&lt;BR /&gt;libfabric:107610:verbs:fabric:vrb_get_device_attrs():615&amp;lt;info&amp;gt; device mlx4_0: first found active port is 1&lt;BR /&gt;libfabric:107610:verbs:fabric:vrb_get_device_attrs():615&amp;lt;info&amp;gt; device mlx4_0: first found active port is 1&lt;BR /&gt;libfabric:107610:verbs:fabric:vrb_get_device_attrs():615&amp;lt;info&amp;gt; device mlx4_0: first found active port is 1&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: verbs (110.10)&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():446&amp;lt;info&amp;gt; "verbs" filtered by provider include/exclude list, skipping&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: mlx (1.4)&lt;BR /&gt;libfabric:107610:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: ofi_hook_noop (110.10)&lt;BR /&gt;libfabric:107610:core:core:fi_getinfo_():1092&amp;lt;info&amp;gt; Found provider with the highest priority mlx, must_use_util_prov = 0&lt;BR /&gt;libfabric:107610:mlx:core:mlx_getinfo():172&amp;lt;info&amp;gt; used inject size = 1024 &lt;BR /&gt;libfabric:107610:mlx:core:mlx_getinfo():219&amp;lt;info&amp;gt; Loaded MLX version 1.6.0&lt;BR /&gt;libfabric:107610:mlx:core:mlx_getinfo():266&amp;lt;warn&amp;gt; MLX: spawn support 0 &lt;BR /&gt;libfabric:107610:core:core:fi_getinfo_():1092&amp;lt;info&amp;gt; Found provider with the highest priority mlx, must_use_util_prov = 0&lt;BR /&gt;libfabric:107610:mlx:core:mlx_getinfo():172&amp;lt;info&amp;gt; used inject size = 1024 &lt;BR /&gt;libfabric:107610:mlx:core:mlx_getinfo():219&amp;lt;info&amp;gt; Loaded MLX version 1.6.0&lt;BR /&gt;libfabric:107610:mlx:core:mlx_getinfo():266&amp;lt;warn&amp;gt; MLX: spawn support 0 &lt;BR /&gt;[0] MPI startup(): libfabric provider: mlx&lt;BR /&gt;libfabric:107610:mlx:core:mlx_fabric_open():172&amp;lt;info&amp;gt; &lt;BR /&gt;libfabric:107610:core:core:fi_fabric_():1372&amp;lt;info&amp;gt; Opened fabric: mlx&lt;BR /&gt;libfabric:107610:mlx:core:ofi_check_rx_attr():782&amp;lt;info&amp;gt; Tx only caps ignored in Rx caps&lt;BR /&gt;libfabric:107610:mlx:core:ofi_check_tx_attr():880&amp;lt;info&amp;gt; Rx only caps ignored in Tx caps&lt;BR /&gt;[0] MPI startup(): max_ch4_vcis: 1, max_reg_eps 1, enable_sep 0, enable_shared_ctxs 0, do_av_insert 1&lt;BR /&gt;libfabric:107610:mlx:core:ofi_check_rx_attr():782&amp;lt;info&amp;gt; Tx only caps ignored in Rx caps&lt;BR /&gt;libfabric:107610:mlx:core:ofi_check_tx_attr():880&amp;lt;info&amp;gt; Rx only caps ignored in Tx caps&lt;BR /&gt;[0] MPI startup(): addrnamelen: 1024&lt;BR /&gt;libfabric:107610:mlx:core:mlx_cm_getname_mlx_format():73&amp;lt;info&amp;gt; Loaded UCP address: [127]...&lt;BR /&gt;libfabric:22236:core:mr:ofi_default_cache_size():56&amp;lt;info&amp;gt; default cache size=2109042048&lt;BR /&gt;libfabric:22236:verbs:fabric:verbs_devs_print():869&amp;lt;info&amp;gt; list of verbs devices found for FI_EP_MSG:&lt;BR /&gt;libfabric:22236:verbs:fabric:verbs_devs_print():873&amp;lt;info&amp;gt; #1 mlx4_0 - IPoIB addresses:&lt;BR /&gt;libfabric:22236:verbs:fabric:verbs_devs_print():883&amp;lt;info&amp;gt; 10.30.18.104&lt;BR /&gt;libfabric:22236:verbs:fabric:verbs_devs_print():883&amp;lt;info&amp;gt; fe80::202:c903:14:ddf1&lt;BR /&gt;libfabric:22236:verbs:fabric:vrb_get_device_attrs():615&amp;lt;info&amp;gt; device mlx4_0: first found active port is 1&lt;BR /&gt;libfabric:22236:verbs:fabric:vrb_get_device_attrs():615&amp;lt;info&amp;gt; device mlx4_0: first found active port is 1&lt;BR /&gt;libfabric:22236:verbs:fabric:vrb_get_device_attrs():615&amp;lt;info&amp;gt; device mlx4_0: first found active port is 1&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: verbs (110.10)&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():446&amp;lt;info&amp;gt; "verbs" filtered by provider include/exclude list, skipping&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: mlx (1.4)&lt;BR /&gt;libfabric:22236:core:core:ofi_register_provider():418&amp;lt;info&amp;gt; registering provider: ofi_hook_noop (110.10)&lt;BR /&gt;libfabric:22236:core:core:fi_getinfo_():1092&amp;lt;info&amp;gt; Found provider with the highest priority mlx, must_use_util_prov = 0&lt;BR /&gt;libfabric:22236:mlx:core:mlx_getinfo():172&amp;lt;info&amp;gt; used inject size = 1024 &lt;BR /&gt;libfabric:22236:mlx:core:mlx_getinfo():219&amp;lt;info&amp;gt; Loaded MLX version 1.6.0&lt;BR /&gt;libfabric:22236:mlx:core:mlx_getinfo():266&amp;lt;warn&amp;gt; MLX: spawn support 0 &lt;BR /&gt;libfabric:22236:core:core:fi_getinfo_():1092&amp;lt;info&amp;gt; Found provider with the highest priority mlx, must_use_util_prov = 0&lt;BR /&gt;libfabric:22236:mlx:core:mlx_getinfo():172&amp;lt;info&amp;gt; used inject size = 1024 &lt;BR /&gt;libfabric:22236:mlx:core:mlx_getinfo():219&amp;lt;info&amp;gt; Loaded MLX version 1.6.0&lt;BR /&gt;libfabric:22236:mlx:core:mlx_getinfo():266&amp;lt;warn&amp;gt; MLX: spawn support 0 &lt;BR /&gt;libfabric:22236:mlx:core:mlx_fabric_open():172&amp;lt;info&amp;gt; &lt;BR /&gt;libfabric:22236:core:core:fi_fabric_():1372&amp;lt;info&amp;gt; Opened fabric: mlx&lt;BR /&gt;libfabric:22236:mlx:core:ofi_check_rx_attr():782&amp;lt;info&amp;gt; Tx only caps ignored in Rx caps&lt;BR /&gt;libfabric:22236:mlx:core:ofi_check_tx_attr():880&amp;lt;info&amp;gt; Rx only caps ignored in Tx caps&lt;BR /&gt;libfabric:22236:mlx:core:ofi_check_rx_attr():782&amp;lt;info&amp;gt; Tx only caps ignored in Rx caps&lt;BR /&gt;libfabric:22236:mlx:core:ofi_check_tx_attr():880&amp;lt;info&amp;gt; Rx only caps ignored in Tx caps&lt;BR /&gt;libfabric:22236:mlx:core:mlx_cm_getname_mlx_format():73&amp;lt;info&amp;gt; Loaded UCP address: [127]...&lt;BR /&gt;libfabric:22236:mlx:core:mlx_av_insert():179&amp;lt;warn&amp;gt; Try to insert address #0, offset=0 (size=2) fi_addr=0x7f2000132a00 &lt;BR /&gt;[1605269643.201248] [cnode003:22236:0] select.c:410 UCX ERROR no active messages transport to &amp;lt;no debug data&amp;gt;: mm/posix - Destination is unreachable, mm/sysv - Destination is unreachable, self/self - Destination is unreachable&lt;BR /&gt;Abort(1091215) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:&lt;BR /&gt;MPIR_Init_thread(136)........: &lt;BR /&gt;MPID_Init(1149)..............: &lt;BR /&gt;MPIDI_OFI_mpi_init_hook(1657): OFI get address vector map failed&lt;BR /&gt;[1605269643.201281] [cnode002:107610:0] select.c:410 UCX ERROR no active messages transport to &amp;lt;no debug data&amp;gt;: mm/posix - Destination is unreachable, mm/sysv - Destination is unreachable, self/self - Destination is unreachable&lt;BR /&gt;libfabric:107610:mlx:core:mlx_av_insert():179&amp;lt;warn&amp;gt; Try to insert address #0, offset=0 (size=2) fi_addr=0x7f200002cb80 &lt;BR /&gt;libfabric:107610:mlx:core:mlx_av_insert():189&amp;lt;warn&amp;gt; address inserted&lt;BR /&gt;libfabric:107610:mlx:core:mlx_av_insert():179&amp;lt;warn&amp;gt; Try to insert address #1, offset=1024 (size=2) fi_addr=0x7f200002cb80 &lt;BR /&gt;Abort(1091215) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:&lt;BR /&gt;MPIR_Init_thread(136)........: &lt;BR /&gt;MPID_Init(1149)..............: &lt;BR /&gt;MPIDI_OFI_mpi_init_hook(1657): OFI get address vector map failed&lt;/P&gt;</description>
      <pubDate>Fri, 13 Nov 2020 12:21:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1228104#M7328</guid>
      <dc:creator>John_Young</dc:creator>
      <dc:date>2020-11-13T12:21:35Z</dc:date>
    </item>
    <item>
      <title>Re: Re:MPI_IRECV sporadically hangs for Intel 2019/2020 but not Intel 2018</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1228106#M7329</link>
      <description>&lt;P&gt;Also, here is the output of ucx_info since I also see a message about message transport. We do not have all the transport methods that I see in some other related posts.&lt;/P&gt;
&lt;P&gt;~/scratch/&amp;gt;ucx_info -d | grep Transport&lt;BR /&gt;7:# Transport: mm&lt;BR /&gt;43:# Transport: mm&lt;BR /&gt;79:# Transport: self&lt;BR /&gt;113:# Transport: tcp&lt;/P&gt;</description>
      <pubDate>Fri, 13 Nov 2020 12:25:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1228106#M7329</guid>
      <dc:creator>John_Young</dc:creator>
      <dc:date>2020-11-13T12:25:04Z</dc:date>
    </item>
    <item>
      <title>Re:MPI_IRECV sporadically hangs for Intel 2019/2020 but not Intel 2018</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1229835#M7341</link>
      <description>&lt;P&gt;Hi John,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Your system does not have all the required transports to use mlx. It might be due to due to a driver misconfiguration, missing libraries, or other fabric software problems.&lt;/P&gt;&lt;P&gt;Could you please check your UCX configuration or contact your system administrator regarding the installation of required transports.&lt;/P&gt;&lt;P&gt;For more information regarding required transports please refer: &lt;A href="https://software.intel.com/content/www/us/en/develop/articles/improve-performance-and-stability-with-intel-mpi-library-on-infiniband.html" target="_blank"&gt;https://software.intel.com/content/www/us/en/develop/articles/improve-performance-and-stability-with-intel-mpi-library-on-infiniband.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Prasanth&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 19 Nov 2020 09:53:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1229835#M7341</guid>
      <dc:creator>PrasanthD_intel</dc:creator>
      <dc:date>2020-11-19T09:53:32Z</dc:date>
    </item>
    <item>
      <title>Re: Re:MPI_IRECV sporadically hangs for Intel 2019/2020 but not Intel 2018</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1230022#M7342</link>
      <description>&lt;P&gt;Prasanth,&lt;/P&gt;&lt;P&gt;Thank you for your help.&amp;nbsp; Our cluster administrator installed the UCX library (v1.9.0) and enabled the compile-time InfiniBand features.&amp;nbsp; I can now use the FI_PROVIDER=mlx and the hang in the test case now seems to be resolved.&amp;nbsp; I will verify that the issue is resolved with our production code shortly.&lt;/P&gt;&lt;P&gt;We have seen this issue on two separate clusters.&amp;nbsp; Maybe the Intel documentation should be updated to clarify (or emphasize) that these libraries need to be installed separately for proper operation.&amp;nbsp; It seems that it may not be clear to all cluster administrators that this is important since it you can get programs to run without these libraries but sub-optimally and with (apparently) sporadic run-time issues.&lt;/P&gt;&lt;P&gt;Thanks again.&amp;nbsp; You've been a great help in resolving this issue.&lt;/P&gt;&lt;P&gt;John&lt;/P&gt;</description>
      <pubDate>Thu, 19 Nov 2020 21:13:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1230022#M7342</guid>
      <dc:creator>John_Young</dc:creator>
      <dc:date>2020-11-19T21:13:53Z</dc:date>
    </item>
    <item>
      <title>Re:MPI_IRECV sporadically hangs for Intel 2019/2020 but not Intel 2018</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1231563#M7347</link>
      <description>&lt;P&gt;Hi John,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;These transport requirements were more related to hardware rather than Intel MPI, but I will forward your suggestion to the internal team regarding the documentation.&lt;/P&gt;&lt;P&gt;Have you verified the fix in your production code? if yes, let us know the results based on which we can go forward.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Prasanth&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 25 Nov 2020 09:45:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1231563#M7347</guid>
      <dc:creator>PrasanthD_intel</dc:creator>
      <dc:date>2020-11-25T09:45:15Z</dc:date>
    </item>
    <item>
      <title>Re:MPI_IRECV sporadically hangs for Intel 2019/2020 but not Intel 2018</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1232675#M7368</link>
      <description>&lt;P&gt;Hi John,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;We haven't heard back from you.&lt;/P&gt;&lt;P&gt;Please confirm whether your problem is resolved or not.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Prasanth&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 30 Nov 2020 10:56:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1232675#M7368</guid>
      <dc:creator>PrasanthD_intel</dc:creator>
      <dc:date>2020-11-30T10:56:39Z</dc:date>
    </item>
    <item>
      <title>Re: Re:MPI_IRECV sporadically hangs for Intel 2019/2020 but not Intel 2018</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1232726#M7373</link>
      <description>&lt;P&gt;&lt;BR /&gt;Prasanth,&lt;/P&gt;&lt;P&gt;Yes, our production code runs fine now after installing the UCX transports. Here is a summary of what we did to avoid the discussed MPI hangs.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Intel MPI 2018:&lt;/P&gt;&lt;P&gt;Pass "-env I_MPI_FABRICS shm:dapl" to mpirun. This is all we needed to do and never observed any hangs.&lt;/P&gt;&lt;P&gt;Intel MPI 2019:&lt;/P&gt;&lt;P&gt;We were unable to use the 'mlx' provider on Intel 2019. I don't know if this is due to our cluster or due to Intel MPI 2019. The hang always occurs unless we choose the 'sockets' provider :&lt;/P&gt;&lt;P&gt;load the UCX 1.9.0 module&lt;/P&gt;&lt;P&gt;export UCX_TLS=rc,ud,sm,self # This doesn't seem to be necessary anymore.&lt;BR /&gt;export FI_PROVIDER=sockets # This IS necessary on our cluster&lt;/P&gt;&lt;P&gt;and pass "-env I_MPI_FABRICS shm:ofi" to mpirun&lt;/P&gt;&lt;P&gt;Intel MPI 2020:&lt;/P&gt;&lt;P&gt;load the UCX 1.9.0 module&lt;/P&gt;&lt;P&gt;export UCX_TLS=rc,ud,sm,self # This doesn't seem to be necessary anymore, but doesn't cause any issues.&lt;BR /&gt;export FI_PROVIDER=mlx # This also no longer seems to be necessary, but doesn't cause any issues.&lt;/P&gt;&lt;P&gt;and pass "-env I_MPI_FABRICS shm:ofi" to mpirun&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;So, the primary issue seems to have been not having the UCX library installed. Our cluster admin built the UCX module with&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;./contrib/configure-release --with-rc \&lt;BR /&gt;--with-ud \&lt;BR /&gt;--with-dc \&lt;BR /&gt;--with-mlx5-dv \&lt;BR /&gt;--with-ib-hw-tm \&lt;BR /&gt;--with-dm \&lt;BR /&gt;--with-cm \&lt;BR /&gt;--prefix=$INSTALL_LOCATION&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Best,&lt;/P&gt;&lt;P&gt;John&lt;/P&gt;</description>
      <pubDate>Mon, 30 Nov 2020 15:20:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1232726#M7373</guid>
      <dc:creator>John_Young</dc:creator>
      <dc:date>2020-11-30T15:20:29Z</dc:date>
    </item>
    <item>
      <title>Re:MPI_IRECV sporadically hangs for Intel 2019/2020 but not Intel 2018</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1233059#M7376</link>
      <description>&lt;P&gt;Hi John,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for providing the steps you have followed.&lt;/P&gt;&lt;P&gt;It has been mentioned in the release notes that the minimum required UCX version is 1.5+ (&lt;A href="https://software.intel.com/content/www/us/en/develop/articles/intel-mpi-library-release-notes-linux.html" rel="noopener noreferrer" target="_blank"&gt;Intel® MPI Library Release Notes for Linux* OS&lt;/A&gt;).&lt;/P&gt;&lt;P&gt;Since your issue has been resolved we are closing this thread. Please raise a new thread for any further assistance from intel.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Prasanth&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 01 Dec 2020 11:11:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-IRECV-sporadically-hangs-for-Intel-2019-2020-but-not-Intel/m-p/1233059#M7376</guid>
      <dc:creator>PrasanthD_intel</dc:creator>
      <dc:date>2020-12-01T11:11:00Z</dc:date>
    </item>
  </channel>
</rss>

