<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Are you encountering this in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1167010#M6473</link>
    <description>&lt;P&gt;Are you encountering this error with every program you are running, or only with certain programs?&lt;/P&gt;&lt;P&gt;Also, if you have installed Intel® Cluster Checker, please run&lt;/P&gt;
&lt;PRE class="brush:bash; class-name:dark;"&gt;clck -f ./&amp;lt;nodefile&amp;gt; -F mpi_prereq_user&lt;/PRE&gt;

&lt;P&gt;This will run diagnostic checks related to Intel® MPI Library functionality and help verify that the cluster is configured as expected.&lt;/P&gt;</description>
    <pubDate>Mon, 13 Jan 2020 14:59:14 GMT</pubDate>
    <dc:creator>James_T_Intel</dc:creator>
    <dc:date>2020-01-13T14:59:14Z</dc:date>
    <item>
      <title>Issue with MPI 2019U6 and MLX provider</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1167008#M6471</link>
      <description>&lt;P&gt;Hi&lt;/P&gt;&lt;P&gt;We have two clusters that are almost identical except that one is now running Mellanox OFED 4.6 and the other 4.5.&lt;/P&gt;&lt;P&gt;With MPI 2019U6 from Studio 2020 distribution, one cluster (4.5) works OK, the other (4.6) does not and throws some UCX errors:&lt;/P&gt;&lt;P&gt;]$ cat slurm-151351.out&lt;BR /&gt;I_MPI_F77=ifort&lt;BR /&gt;I_MPI_PORT_RANGE=60001:61000&lt;BR /&gt;I_MPI_F90=ifort&lt;BR /&gt;I_MPI_CC=icc&lt;BR /&gt;I_MPI_CXX=icpc&lt;BR /&gt;I_MPI_DEBUG=999&lt;BR /&gt;I_MPI_FC=ifort&lt;BR /&gt;I_MPI_HYDRA_BOOTSTRAP=slurm&lt;BR /&gt;I_MPI_ROOT=/apps/compilers/intel/2020.0/compilers_and_libraries_2020.0.166/linux/mpi&lt;BR /&gt;MPI startup(): Imported environment partly inaccesible. Map=0 Info=0&lt;BR /&gt;[0] MPI startup(): libfabric version: 1.9.0a1-impi&lt;BR /&gt;[0] MPI startup(): libfabric provider: mlx&lt;BR /&gt;[0] MPI startup(): detected mlx provider, set device name to "mlx"&lt;BR /&gt;[0] MPI startup(): max_ch4_vcis: 1, max_reg_eps 1, enable_sep 0, enable_shared_ctxs 0, do_av_insert 1&lt;BR /&gt;[0] MPI startup(): addrname_len: 512, addrname_firstlen: 512&lt;BR /&gt;[0] MPI startup(): val_max: 4096, part_len: 4095, bc_len: 1030, num_parts: 1&lt;BR /&gt;[1578327353.181131] [scs0027:247642:0]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select.c:410&amp;nbsp; UCX&amp;nbsp; ERROR no active messages transport to &amp;lt;no debug data&amp;gt;: mm/posix - Destination is unreachable, mm/sysv - Destination is unreachable, self/self - Destination is unreachable&lt;BR /&gt;[1578327353.180508] [scs0088:378614:0]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select.c:410&amp;nbsp; UCX&amp;nbsp; ERROR no active messages transport to &amp;lt;no debug data&amp;gt;: mm/posix - Destination is unreachable, mm/sysv - Destination is unreachable, self/self - Destination is unreachable&lt;BR /&gt;Abort(1091471) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:&lt;BR /&gt;MPIR_Init_thread(703)........:&lt;BR /&gt;MPID_Init(958)...............:&lt;BR /&gt;MPIDI_OFI_mpi_init_hook(1382): OFI get address vector map failed&lt;BR /&gt;Abort(1091471) on node 2 (rank 2 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:&lt;BR /&gt;MPIR_Init_thread(703)........:&lt;BR /&gt;MPID_Init(958)...............:&lt;BR /&gt;MPIDI_OFI_mpi_init_hook(1382): OFI get address vector map failed&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is this possibly an Intel MPI issue or something at our end (where 2018 and early 2019 versions worked OK)?&lt;/P&gt;&lt;P&gt;Thanks&lt;BR /&gt;A&lt;/P&gt;</description>
      <pubDate>Mon, 06 Jan 2020 16:24:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1167008#M6471</guid>
      <dc:creator>Ade_F_</dc:creator>
      <dc:date>2020-01-06T16:24:18Z</dc:date>
    </item>
    <item>
      <title>Hi Ade,</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1167009#M6472</link>
      <description>&lt;P&gt;Hi Ade,&lt;/P&gt;&lt;P&gt;Thanks for reaching out to us. We are working on your issue.&amp;nbsp;we will get back to you soon.&lt;/P&gt;&lt;P&gt;-Shubham&lt;/P&gt;</description>
      <pubDate>Tue, 07 Jan 2020 06:01:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1167009#M6472</guid>
      <dc:creator>Shubham_C_Intel</dc:creator>
      <dc:date>2020-01-07T06:01:42Z</dc:date>
    </item>
    <item>
      <title>Are you encountering this</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1167010#M6473</link>
      <description>&lt;P&gt;Are you encountering this error with every program you are running, or only with certain programs?&lt;/P&gt;&lt;P&gt;Also, if you have installed Intel® Cluster Checker, please run&lt;/P&gt;
&lt;PRE class="brush:bash; class-name:dark;"&gt;clck -f ./&amp;lt;nodefile&amp;gt; -F mpi_prereq_user&lt;/PRE&gt;

&lt;P&gt;This will run diagnostic checks related to Intel® MPI Library functionality and help verify that the cluster is configured as expected.&lt;/P&gt;</description>
      <pubDate>Mon, 13 Jan 2020 14:59:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1167010#M6473</guid>
      <dc:creator>James_T_Intel</dc:creator>
      <dc:date>2020-01-13T14:59:14Z</dc:date>
    </item>
    <item>
      <title>It seems to be with every</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1167011#M6474</link>
      <description>&lt;P&gt;It seems to be with every program, although admittedly I'm only trying noddy examples 'hello world' and a primes counting example.&lt;/P&gt;&lt;P&gt;All seem to work on the OFED 4.5 cluster, but fail on the OFED 4.6 cluster, when Studio 2020 is used.&lt;/P&gt;&lt;P&gt;Cluster checker happy except for the logical processor count as we have it enabled in BIOS but twiddled at boot on all our systems:&lt;/P&gt;&lt;P&gt;SUMMARY&lt;BR /&gt;&amp;nbsp; Command-line:&amp;nbsp;&amp;nbsp; clck -F mpi_prereq_user&lt;BR /&gt;&amp;nbsp; Tests Run:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; mpi_prereq_user&lt;BR /&gt;&amp;nbsp; ERROR:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2 tests encountered errors. Information may be incomplete. See&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; clck_results.log and search for "ERROR" for more information.&lt;BR /&gt;&amp;nbsp; Overall Result: 1 issue found - FUNCTIONALITY (1)&lt;BR /&gt;--------------------------------------------------------------------------------&lt;BR /&gt;2 nodes tested:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cdcs[0003-0004]&lt;BR /&gt;0 nodes with no issues:&lt;BR /&gt;2 nodes with issues:&amp;nbsp;&amp;nbsp;&amp;nbsp; cdcs[0003-0004]&lt;BR /&gt;--------------------------------------------------------------------------------&lt;BR /&gt;FUNCTIONALITY&lt;BR /&gt;The following functionality issues were detected:&lt;BR /&gt;&amp;nbsp; 1. There is a mismatch between number of available logical cores and maximum&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; logical cores. Cores '40-79' are offline.&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2 nodes: cdcs[0003-0004]&lt;/P&gt;&lt;P&gt;HARDWARE UNIFORMITY&lt;BR /&gt;No issues detected.&lt;/P&gt;&lt;P&gt;PERFORMANCE&lt;BR /&gt;No issues detected.&lt;/P&gt;&lt;P&gt;SOFTWARE UNIFORMITY&lt;BR /&gt;No issues detected.&lt;/P&gt;&lt;P&gt;See clck_results.log for more information.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Jan 2020 14:21:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1167011#M6474</guid>
      <dc:creator>Ade_F_</dc:creator>
      <dc:date>2020-01-14T14:21:24Z</dc:date>
    </item>
    <item>
      <title>Hello Ade,</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1167012#M6475</link>
      <description>&lt;P&gt;Hello Ade,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Have you tried to measure the performance "mlx" provider with MOFED 4.5? Can you run the standard "IMB" or OSU benchmarks?&amp;nbsp;&lt;/P&gt;&lt;P&gt;Have you tried any other MPI stacks? OpenMPI is available with MOFED distributions and you can quickly try any of these benchmarks that come prebuilt.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;regards&lt;/P&gt;&lt;P&gt;Michael&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 17 Jan 2020 15:50:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1167012#M6475</guid>
      <dc:creator>drMikeT</dc:creator>
      <dc:date>2020-01-17T15:50:44Z</dc:date>
    </item>
    <item>
      <title>Hi Michael et al.</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1167013#M6476</link>
      <description>&lt;P&gt;Hi Michael et al.&lt;/P&gt;&lt;P&gt;We only have this problem with 2020.&amp;nbsp; 2019, 2018, OpenMPI, MPICH, Mellanox's HPCX OpenMPI all OK.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have now - I think - isolated it to something between the mlx FI_PROVIDER and the MLNX_OFED 4.6 we have.&amp;nbsp; Setting the provider to verbs appears to cure the problem, although is perhaps less than ideal.&amp;nbsp; Equally the mlx provider has no issue on the MLNX_OFED 4.5 deployments we have.&lt;/P&gt;&lt;P&gt;Michael - if you are interested in performance separately - rather than just making it work - I can provide some IMB output.&lt;/P&gt;&lt;P&gt;Cheers&lt;/P&gt;&lt;P&gt;Ade&lt;/P&gt;</description>
      <pubDate>Tue, 21 Jan 2020 22:45:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1167013#M6476</guid>
      <dc:creator>Ade_F_</dc:creator>
      <dc:date>2020-01-21T22:45:27Z</dc:date>
    </item>
    <item>
      <title>Ade, </title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1167014#M6477</link>
      <description>&lt;P&gt;Ade,&amp;nbsp;&lt;/P&gt;&lt;P&gt;In my tests, verbs provider offers 2-3GB/s at best which is really not good (6X below line speed for EDR).&lt;/P&gt;&lt;P&gt;Is your CPU Zen2 or Intel based?&amp;nbsp;&lt;/P&gt;&lt;P&gt;Sure I can see some numbers :)&lt;/P&gt;&lt;P&gt;regards&lt;/P&gt;&lt;P&gt;Michael&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 21 Jan 2020 22:57:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1167014#M6477</guid>
      <dc:creator>drMikeT</dc:creator>
      <dc:date>2020-01-21T22:57:00Z</dc:date>
    </item>
    <item>
      <title>I have the same problem and</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1167015#M6478</link>
      <description>&lt;P&gt;I have the same problem and my architecture is AMD 7002 series (same behavior with Epyc 7000 series too when using more than 45 PPN)&amp;nbsp;and running CentOS 7.6. The MLX provider doesn't work with 2019 U6. When using 2019 U5 and the default provider, I believe it is RxM, crashes&amp;nbsp;&amp;nbsp;when using more than 80 PPN i.e If I use 80 or less PPN and 9 nodes it works without errors. Not sure what is going on.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Error with 2019 U5 when using more than 80 PPN on 7002 series or 45 PPN on 7000 series:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;MPIDI_OFI_send_lightweight_request:&lt;BR /&gt;(unknown)(): Other MPI error&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Error with 2019 U6 on 7002 series with MLX FI_PROVIDER:&lt;/P&gt;&lt;P&gt;MPIDI_OFI_send_lightweight_request:&lt;BR /&gt;(unknown)(): Other MPI error&lt;/P&gt;&lt;P&gt;and an ADDR_INFO error&lt;/P&gt;&lt;P&gt;Furthermore, when using the MLX provider f_info returns an error -61&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 03 Mar 2020 11:48:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1167015#M6478</guid>
      <dc:creator>0__Dops0</dc:creator>
      <dc:date>2020-03-03T11:48:46Z</dc:date>
    </item>
    <item>
      <title>Re: I have the same problem and</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1212487#M7192</link>
      <description>UCX_TLS=ud,sm,self</description>
      <pubDate>Fri, 25 Sep 2020 12:11:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1212487#M7192</guid>
      <dc:creator>Dmitry_S_Intel</dc:creator>
      <dc:date>2020-09-25T12:11:14Z</dc:date>
    </item>
    <item>
      <title>Re: I have the same problem and</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1248631#M7670</link>
      <description>&lt;P&gt;&lt;a href="https://community.intel.com/t5/user/viewprofilepage/user-id/96985"&gt;@Dmitry_S_Intel&lt;/a&gt;&amp;nbsp; That works &amp;nbsp;for me thanks.&lt;/P&gt;
&lt;P&gt;For me the problem only occurred when I launched above 10 nodes&lt;BR /&gt;&lt;BR /&gt;But what does your suggestion it mean - the last thing I want is having my nodes running over the ethernet connection? Can you please explain whether that is the case?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 21 Jan 2021 15:54:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1248631#M7670</guid>
      <dc:creator>AThar2</dc:creator>
      <dc:date>2021-01-21T15:54:18Z</dc:date>
    </item>
    <item>
      <title>Re: I have the same problem and</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1328904#M8923</link>
      <description>&lt;P&gt;Hello, how was this variable implemented in the script? As seen below? I am also receiving the "&lt;SPAN&gt;OFI get address vector map failed" error&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;export UCX_TLS=ud,sm,self&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Nov 2021 20:53:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Issue-with-MPI-2019U6-and-MLX-provider/m-p/1328904#M8923</guid>
      <dc:creator>solaremg</dc:creator>
      <dc:date>2021-11-10T20:53:44Z</dc:date>
    </item>
  </channel>
</rss>

