Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2273 Discussions

mpirun corrupts SLURM_NNODES environment variable when run on more than 16 nodes

nickw1
Beginner
1,429 Views

When you submit to run on more than 16 nodes of a Slurm cluster the value of the SLURM_NNODES environment variable in the MPI processes becomes corrupted:

#!/bin/sh
#SBATCH --nodes=18 --ntasks-per-node=1
mpirun -prepend-rank /usr/bin/env | grep SLURM_NNODES

gives:

[17] SLURM_NNODES: 16
[8] SLURM_NNODES: 16
[9] SLURM_NNODES: 16
[6] SLURM_NNODES: 16
[13] SLURM_NNODES: 16
[7] SLURM_NNODES: 16
[15] SLURM_NNODES: 16
[12] SLURM_NNODES: 16
[16] SLURM_NNODES: 16
[0] SLURM_NNODES: 16
[1] SLURM_NNODES: 1
[4] SLURM_NNODES: 16
[14] SLURM_NNODES: 16
[10] SLURM_NNODES: 16
[11] SLURM_NNODES: 16
[3] SLURM_NNODES: 1
[5] SLURM_NNODES: 16
[2] SLURM_NNODES: 16

 

The SLURM_JOB_NUM_NODES environment variable gives the correct value and setting:

export I_MPI_HYDRA_BRANCH_COUNT=0

works around the issue

0 Kudos
1 Reply
TobiasK
Moderator
1,403 Views

@nickw1 
can you please give more information on your environment? Please also add the output of I_MPI_DEBUG=10

0 Kudos
Reply