<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re:MPI_Init error under Slurm in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Init-error-under-Slurm/m-p/1381271#M9459</link>
    <description>&lt;P&gt;&lt;SPAN style="font-size: 16px; font-family: intel-clear;"&gt;Hi Kurt,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 16px; font-family: intel-clear;"&gt;We did not heard back from you for the additional information and will close this thread.&amp;nbsp;If you require additional assistance from Intel, please start a new thread.&amp;nbsp;Any further interaction in this thread will be considered community only.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 16px; font-family: intel-clear;"&gt;Best,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 16px; font-family: intel-clear;"&gt;Xiao&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
    <pubDate>Mon, 02 May 2022 20:25:52 GMT</pubDate>
    <dc:creator>Xiao_Z_Intel</dc:creator>
    <dc:date>2022-05-02T20:25:52Z</dc:date>
    <item>
      <title>MPI_Init error under Slurm</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Init-error-under-Slurm/m-p/1367321#M9275</link>
      <description>&lt;P&gt;I'm using IntelMpi 2021.5.1 and am trying to start a job under Slurm 20.11.8.&amp;nbsp; I want to start 1 task on each of 5 nodes (parent.cpp), with 2 CPUs reserved per node so that each task can spawn a single new task (child.cpp) using MPI_Comm_spawn (which may be irrelevant because the error seem to be happening in MPI_Init in parent.cpp).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here is the slurm sbatch command:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;$ sbatch --nodes=5 --ntasks=5 --cpus-per-task=2 -D /home/kmccall/slurm_test&amp;nbsp; --verbose slurm_test-intel.bash&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here are the contents of the bash script&amp;nbsp;slurm_test-intel.bash that the above command calls:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;module load intel/intelmpi&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;export I_MPI_PIN_RESPECT_CPUSET=0; mpirun ./parent&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here is the C++ code for the parent:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;int main(int argc, char *argv[])
{
    int rank, world_size, error_codes[1];
    char hostname[128], short_host_name[16];
    MPI_Comm intercom;
    MPI_Info info;

    MPI_Init(&amp;amp;argc, &amp;amp;argv);

    MPI_Comm_rank(MPI_COMM_WORLD, &amp;amp;rank);
    MPI_Comm_size(MPI_COMM_WORLD, &amp;amp;world_size);

    gethostname(hostname, 127);

    std::cout &amp;lt;&amp;lt; "Hello from parent process on " &amp;lt;&amp;lt; hostname &amp;lt;&amp;lt; std::endl;

    char info_str[64];
    sprintf(info_str, "ppr:%d:node", 1);
    MPI_Info_create(&amp;amp;info);
    MPI_Info_set(info, "host", hostname);
    MPI_Info_set(info, "map-by", info_str);

    MPI_Comm_spawn("child", argv, 1, info, 0, MPI_COMM_SELF, &amp;amp;intercom,
        error_codes);

    sleep(20);
    MPI_Finalize();
}
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I haven't included the child.cpp code for brevity, because from the error messages below, it looks like the problem is happening in MPI_Init in parent.cpp before MPI_Comm_spawn is ever called.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;[1646861733.846898] [n005:600573:0] ib_verbs.h:84 UCX ERROR ibv_exp_query_device(mlx5_0) returned 95: Operation not supported&lt;BR /&gt;[1646861733.847280] [n002:3274504:0] ib_verbs.h:84 UCX ERROR ibv_exp_query_device(mlx5_0) returned 95: Operation not supported&lt;BR /&gt;[1646861733.847848] [n004:674599:0] ib_verbs.h:84 UCX ERROR ibv_exp_query_device(mlx5_0) returned 95: Operation not supported&lt;BR /&gt;[1646861733.848318] [n001:3276867:0] ib_verbs.h:84 UCX ERROR ibv_exp_query_device(mlx5_0) returned 95: Operation not supported&lt;BR /&gt;[1646861733.860777] [n003:585069:0] ib_verbs.h:84 UCX ERROR ibv_exp_query_device(mlx5_0) returned 95: Operation not supported&lt;BR /&gt;[1646861733.866219] [n001:3276867:0] select.c:434 UCX ERROR no active messages transport to &amp;lt;no debug data&amp;gt;: posix/memory - Destination is unreachable, sysv/memory - Destination is unreachable, self/memory - Destination is unreachable, sockcm/sockaddr - no am bcopy, rdmacm/sockaddr - no am bcopy, cma/memory - no am bcopy&lt;BR /&gt;Abort(1091215) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:&lt;BR /&gt;MPIR_Init_thread(136)........:&lt;BR /&gt;MPID_Init(1138)..............:&lt;BR /&gt;MPIDI_OFI_mpi_init_hook(1541): OFI get address vector map failed&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Do you have any clue what is causing this error?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 Mar 2022 22:03:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Init-error-under-Slurm/m-p/1367321#M9275</guid>
      <dc:creator>kmccall882</dc:creator>
      <dc:date>2022-03-09T22:03:09Z</dc:date>
    </item>
    <item>
      <title>Re: MPI_Init error under Slurm</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Init-error-under-Slurm/m-p/1370528#M9338</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for reaching out to us.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We tried with your sample reproducer code and we were able to get the expected results.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We followed the below steps using the latest Intel MPI 2021.5 on a Linux machine:&lt;/P&gt;
&lt;P&gt;1. Below is my run.bash script:&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Please find the attachment(parent.cpp &amp;amp; child.cpp files in TEST.zip attachment below).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;#!/bin/sh
source /opt/intel/oneAPI/latest/setvars.sh
#clck
mpiicpc parent.cpp -o parent
mpiicpc child.cpp -o child
I_MPI_SPAWN=on I_MPI_PIN_RESPECT_CPUSET=0  FI_PROVIDER=mlx mpirun -bootstrap ssh  ./parent
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2. Command to launch the Slurm&amp;nbsp; job:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;sbatch -p workq -C &amp;lt;node-name&amp;gt; -t 190 --nodes=5 --ntasks=5 --cpus-per-task=2 -D /home/syedurux/test --verbose run.bash
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;output for the above command is as follows:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;sbatch: environment addon enabled
sbatch: defined options
sbatch: -------------------- --------------------
sbatch: chdir               : /home/syedurux/test
sbatch: constraint          : icx8360YatsB0
sbatch: cpus-per-task       : 2
sbatch: nodes               : 5
sbatch: ntasks              : 5
sbatch: partition           : workq
sbatch: time                : 03:10:00
sbatch: verbose             : 1
sbatch: -------------------- --------------------
sbatch: end of defined options
sbatch: select/cons_res: common_init: select/cons_res loaded
sbatch: select/cons_tres: common_init: select/cons_tres loaded
sbatch: select/cray_aries: init: Cray/Aries node selection plugin loaded
Submitted batch job 354072
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;3. The above command will generate&amp;nbsp;slurm-354072.out file which contains the actual output as follows:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;
:: initializing oneAPI environment ...
   slurm_script: BASH_VERSION = 4.4.20(1)-release
   args: Using "$@" for setvars.sh arguments:
:: advisor -- latest
:: ccl -- latest
:: clck -- latest
:: compiler -- latest
:: dal -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: dnnl -- latest
:: dpcpp-ct -- latest
:: dpl -- latest
:: inspector -- latest
:: intelpython -- latest
:: ipp -- latest
:: ippcp -- latest
:: ipp -- latest
:: itac -- latest
:: mkl -- latest
:: mpi -- latest
:: tbb -- latest
:: vpl -- latest
:: vtune -- latest
:: oneAPI environment initialized ::

Hello from parent process on eii314
Hello from parent process on eii332
Hello from parent process on eii331
Hello from parent process on eii333
Hello from parent process on eii334
Hello from child process on eii314
Hello from child process on eii331
Hello from child process on eii333
Hello from child process on eii332
Hello from child process on eii334
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Could you please provide us with the below details which can help us to investigate more on your issue?&lt;/P&gt;
&lt;P&gt;1. Please run the below cluster checker command and share the complete log file.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;clck -f &amp;lt;nodefile&amp;gt; -F mpi_prereq_user&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;(or)&lt;/P&gt;
&lt;P&gt;To run Intel® Cluster Checker by using a Slurm script, just include the environment setup of the Intel oneAPI and clck commands in your slurm script:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;source /opt/intel/oneapi/setvars.sh
clck
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For more information, please refer to the link:&amp;nbsp;&lt;A href="https://www.intel.com/content/www/us/en/develop/documentation/cluster-checker-user-guide/top/getting-started.html" target="_blank" rel="noopener"&gt;https://www.intel.com/content/www/us/en/develop/documentation/cluster-checker-user-guide/top/getting-started.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2. Could you please provide us with CPU details?&lt;/P&gt;
&lt;P&gt;3. Also, could you please confirm with us the FI provider being used to encounter this issue?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks &amp;amp; Regards.&lt;/P&gt;
&lt;P&gt;Santosh&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 21 Mar 2022 10:58:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Init-error-under-Slurm/m-p/1370528#M9338</guid>
      <dc:creator>SantoshY_Intel</dc:creator>
      <dc:date>2022-03-21T10:58:45Z</dc:date>
    </item>
    <item>
      <title>Re: MPI_Init error under Slurm</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Init-error-under-Slurm/m-p/1371598#M9353</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hi Kurt,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Could you please provide the information that Santosh asked for in the earlier post &lt;/SPAN&gt;&lt;SPAN&gt;(including the result of &lt;/SPAN&gt;&lt;SPAN&gt;running&lt;/SPAN&gt; Intel® Cluster Checker, CPU details and FI provider used)&lt;SPAN&gt;?&lt;/SPAN&gt;&lt;SPAN&gt; In addition, please run the following items and share with us the detailed results including the complete log files?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;SPAN&gt;share the output of&lt;/SPAN&gt; &lt;SPAN&gt;ucx_info -d&lt;/SPAN&gt; &lt;SPAN&gt;and&lt;/SPAN&gt; &lt;SPAN&gt;fi_info -v&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;run the code by enabling debug options using&lt;/SPAN&gt; &lt;SPAN&gt;I_MPI_DEBUG=10&lt;/SPAN&gt;&lt;SPAN&gt; and&lt;/SPAN&gt; &lt;SPAN&gt;FI_LOG_LEVEL=debug&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;run the code without using the Slurm Scheduler and enable debug options&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;run the code with tcp as your OFI* provider&lt;/SPAN&gt;&lt;SPAN&gt; (&lt;/SPAN&gt;&lt;SPAN&gt;FI_PROVIDER=tc&lt;/SPAN&gt;&lt;SPAN&gt;p&lt;/SPAN&gt;&lt;SPAN&gt; ) &lt;/SPAN&gt;&lt;SPAN&gt;and enable debug options&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Best,&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Xiao&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 24 Mar 2022 06:37:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Init-error-under-Slurm/m-p/1371598#M9353</guid>
      <dc:creator>Xiao_Z_Intel</dc:creator>
      <dc:date>2022-03-24T06:37:07Z</dc:date>
    </item>
    <item>
      <title>Re:MPI_Init error under Slurm</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-Init-error-under-Slurm/m-p/1381271#M9459</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 16px; font-family: intel-clear;"&gt;Hi Kurt,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 16px; font-family: intel-clear;"&gt;We did not heard back from you for the additional information and will close this thread.&amp;nbsp;If you require additional assistance from Intel, please start a new thread.&amp;nbsp;Any further interaction in this thread will be considered community only.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 16px; font-family: intel-clear;"&gt;Best,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 16px; font-family: intel-clear;"&gt;Xiao&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 02 May 2022 20:25:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-Init-error-under-Slurm/m-p/1381271#M9459</guid>
      <dc:creator>Xiao_Z_Intel</dc:creator>
      <dc:date>2022-05-02T20:25:52Z</dc:date>
    </item>
  </channel>
</rss>

