MPI_Init error under Slurm

kmccall882 · ‎03-09-2022

I'm using IntelMpi 2021.5.1 and am trying to start a job under Slurm 20.11.8. I want to start 1 task on each of 5 nodes (parent.cpp), with 2 CPUs reserved per node so that each task can spawn a single new task (child.cpp) using MPI_Comm_spawn (which may be irrelevant because the error seem to be happening in MPI_Init in parent.cpp).

Here is the slurm sbatch command:

$ sbatch --nodes=5 --ntasks=5 --cpus-per-task=2 -D /home/kmccall/slurm_test --verbose slurm_test-intel.bash

Here are the contents of the bash script slurm_test-intel.bash that the above command calls:

module load intel/intelmpi

export I_MPI_PIN_RESPECT_CPUSET=0; mpirun ./parent

Here is the C++ code for the parent:

int main(int argc, char *argv[])
{
    int rank, world_size, error_codes[1];
    char hostname[128], short_host_name[16];
    MPI_Comm intercom;
    MPI_Info info;

    MPI_Init(&argc, &argv);

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    gethostname(hostname, 127);

    std::cout << "Hello from parent process on " << hostname << std::endl;

    char info_str[64];
    sprintf(info_str, "ppr:%d:node", 1);
    MPI_Info_create(&info);
    MPI_Info_set(info, "host", hostname);
    MPI_Info_set(info, "map-by", info_str);

    MPI_Comm_spawn("child", argv, 1, info, 0, MPI_COMM_SELF, &intercom,
        error_codes);

    sleep(20);
    MPI_Finalize();
}

I haven't included the child.cpp code for brevity, because from the error messages below, it looks like the problem is happening in MPI_Init in parent.cpp before MPI_Comm_spawn is ever called.

[1646861733.846898] [n005:600573:0] ib_verbs.h:84 UCX ERROR ibv_exp_query_device(mlx5_0) returned 95: Operation not supported
[1646861733.847280] [n002:3274504:0] ib_verbs.h:84 UCX ERROR ibv_exp_query_device(mlx5_0) returned 95: Operation not supported
[1646861733.847848] [n004:674599:0] ib_verbs.h:84 UCX ERROR ibv_exp_query_device(mlx5_0) returned 95: Operation not supported
[1646861733.848318] [n001:3276867:0] ib_verbs.h:84 UCX ERROR ibv_exp_query_device(mlx5_0) returned 95: Operation not supported
[1646861733.860777] [n003:585069:0] ib_verbs.h:84 UCX ERROR ibv_exp_query_device(mlx5_0) returned 95: Operation not supported
[1646861733.866219] [n001:3276867:0] select.c:434 UCX ERROR no active messages transport to <no debug data>: posix/memory - Destination is unreachable, sysv/memory - Destination is unreachable, self/memory - Destination is unreachable, sockcm/sockaddr - no am bcopy, rdmacm/sockaddr - no am bcopy, cma/memory - no am bcopy
Abort(1091215) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(136)........:
MPID_Init(1138)..............:
MPIDI_OFI_mpi_init_hook(1541): OFI get address vector map failed

Do you have any clue what is causing this error?

SantoshY_Intel · ‎03-21-2022

Hi,

Thanks for reaching out to us.

We tried with your sample reproducer code and we were able to get the expected results.

We followed the below steps using the latest Intel MPI 2021.5 on a Linux machine:

1. Below is my run.bash script:
Please find the attachment(parent.cpp & child.cpp files in TEST.zip attachment below).

#!/bin/sh
source /opt/intel/oneAPI/latest/setvars.sh
#clck
mpiicpc parent.cpp -o parent
mpiicpc child.cpp -o child
I_MPI_SPAWN=on I_MPI_PIN_RESPECT_CPUSET=0  FI_PROVIDER=mlx mpirun -bootstrap ssh  ./parent

2. Command to launch the Slurm job:

sbatch -p workq -C <node-name> -t 190 --nodes=5 --ntasks=5 --cpus-per-task=2 -D /home/syedurux/test --verbose run.bash

output for the above command is as follows:

sbatch: environment addon enabled
sbatch: defined options
sbatch: -------------------- --------------------
sbatch: chdir               : /home/syedurux/test
sbatch: constraint          : icx8360YatsB0
sbatch: cpus-per-task       : 2
sbatch: nodes               : 5
sbatch: ntasks              : 5
sbatch: partition           : workq
sbatch: time                : 03:10:00
sbatch: verbose             : 1
sbatch: -------------------- --------------------
sbatch: end of defined options
sbatch: select/cons_res: common_init: select/cons_res loaded
sbatch: select/cons_tres: common_init: select/cons_tres loaded
sbatch: select/cray_aries: init: Cray/Aries node selection plugin loaded
Submitted batch job 354072

3. The above command will generate slurm-354072.out file which contains the actual output as follows:


:: initializing oneAPI environment ...
   slurm_script: BASH_VERSION = 4.4.20(1)-release
   args: Using "$@" for setvars.sh arguments:
:: advisor -- latest
:: ccl -- latest
:: clck -- latest
:: compiler -- latest
:: dal -- latest
:: debugger -- latest
:: dev-utilities -- latest
:: dnnl -- latest
:: dpcpp-ct -- latest
:: dpl -- latest
:: inspector -- latest
:: intelpython -- latest
:: ipp -- latest
:: ippcp -- latest
:: ipp -- latest
:: itac -- latest
:: mkl -- latest
:: mpi -- latest
:: tbb -- latest
:: vpl -- latest
:: vtune -- latest
:: oneAPI environment initialized ::

Hello from parent process on eii314
Hello from parent process on eii332
Hello from parent process on eii331
Hello from parent process on eii333
Hello from parent process on eii334
Hello from child process on eii314
Hello from child process on eii331
Hello from child process on eii333
Hello from child process on eii332
Hello from child process on eii334

Could you please provide us with the below details which can help us to investigate more on your issue?

1. Please run the below cluster checker command and share the complete log file.

clck -f <nodefile> -F mpi_prereq_user

(or)

To run Intel® Cluster Checker by using a Slurm script, just include the environment setup of the Intel oneAPI and clck commands in your slurm script:

source /opt/intel/oneapi/setvars.sh
clck

For more information, please refer to the link: https://www.intel.com/content/www/us/en/develop/documentation/cluster-checker-user-guide/top/getting-started.html

2. Could you please provide us with CPU details?

3. Also, could you please confirm with us the FI provider being used to encounter this issue?

Thanks & Regards.

Santosh

Xiao_Z_Intel · ‎03-23-2022

Hi Kurt,

Could you please provide the information that Santosh asked for in the earlier post (including the result of running Intel® Cluster Checker, CPU details and FI provider used)? In addition, please run the following items and share with us the detailed results including the complete log files?

share the output of ucx_info -d and fi_info -v
run the code by enabling debug options using I_MPI_DEBUG=10 and FI_LOG_LEVEL=debug
run the code without using the Slurm Scheduler and enable debug options
run the code with tcp as your OFI* provider (FI_PROVIDER=tcp ) and enable debug options

Best,

Xiao

Xiao_Z_Intel · ‎05-02-2022

Hi Kurt,

We did not heard back from you for the additional information and will close this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

Best,

Xiao