Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2234 Discussions

MPI_Init hangs for "heterogenous" srun

amedvedev
Beginner
772 Views

Hello,

I'm facing an inability to run the simple but "heterogeneous" MPI configurations on BSC MareNostrum5 machine (https://www.bsc.es/marenostrum/marenostrum-5). I'd like to understand how to diagnose and debug this in order to make this configuration work.

I'm running a simple hello-world MPI Fortran program on 2 nodes. For each of the nodes, I'd like to have different number of ranks per node. I'm using SLURM interface to do so in such a way:

```
srun --exclusive --cpus-per-task=4 --ntasks-per-node=28 --hint=nomultithread -N 1 -n 28 ./mpi_c : --exclusive --cpus-per-task=8 --ntasks-per-node=14 --hint=nomultithread -N 1 -n 14  ./mpi_c
```

So I'm setting up 28ranks/4threads configuration for the first node, and 14ranks/8threads one for the second node.

This runs as expected when I start execution directly by invoking srun, but when I first do sbatch allocation, and srun after that, the MPI program hangs in the MPI_Init:

```
run.sh:
sbatch --exclusive --account=XXX --qos=XXX --time=1 -N 2 ./srun.sh

srun.sh:
srun --exclusive --cpus-per-task=4 --ntasks-per-node=28 --hint=nomultithread -N 1 -n 28 ./mpi_c : --exclusive --cpus-per-task=8 --ntasks-per-node=14 --hint=nomultithread -N 1 -n 14  ./mpi_c
```

```

(gdb) bt
#0 0x00007effcdc32872 in read () from /lib64/libc.so.6
#1 0x00007effcdaa3fdb in read (__nbytes=6, __buf=0x7ffe9e7f0b39, __fd=53)
at /usr/include/bits/unistd.h:38
#2 PMIi_ReadCommand (cmd=0x7ffe9e7f23c0, fd=<optimized out>) at pmi2_api.c:1401
#3 PMIi_ReadCommandExp (fd=<optimized out>, cmd=cmd@entry=0x7ffe9e7f23c0,
exp=exp@entry=0x7effcdaa8610 <KVSFENCERESP_CMD> "kvs-fence-response",
rc=rc@entry=0x7ffe9e7f23bc, errmsg=errmsg@entry=0x7ffe9e7f23b0) at pmi2_api.c:1539
#4 0x00007effcdaa5d87 in PMI2_KVS_Fence () at pmi2_api.c:748
#5 0x00007effce87c6b0 in PMI2_KVS_Fence () at ../../src/pmi/intel/pmi2_virtualization.c:185
#6 0x00007effce7989a6 in MPIR_pmi_barrier () at ../../src/util/mpir_pmi.c:507
#7 0x00007effce79bc08 in optional_bcast_barrier (domain=MPIR_PMI_DOMAIN_LOCAL)
at ../../src/util/mpir_pmi.c:1286
#8 0x00007effce79c0bd in MPIR_pmi_bcast (buf=0x7f20001f5c00, bufsize=1024,
domain=MPIR_PMI_DOMAIN_LOCAL) at ../../src/util/mpir_pmi.c:1352
#9 0x00007effce694d40 in MPIDU_Init_shm_init () at ../../src/mpid/common/shm/mpidu_init_shm.c:174
#10 0x00007effce235a46 in MPID_Init (requested=0, provided=0x7effd06a3700 <MPIR_ThreadInfo>)
at ../../src/mpid/ch4/src/ch4_init.c:1525
#11 0x00007effce5d4881 in MPIR_Init_thread (argc=0x0, argv=0x0, user_required=0,
provided=0x7ffe9e7f300c) at ../../src/mpi/init/initthread.c:175
#12 0x00007effce5d4151 in PMPI_Init (argc=0x0, argv=0x0) at ../../src/mpi/init/init.c:139
#13 0x00007effd0d8448b in pmpi_init_ (ierr=0x35) at ../../src/binding/fortran/mpif_h/initf.c:274
#14 0x0000000000404299 in MAIN__ ()
#15 0x000000000040420d in main ()
(gdb)

```

I suspect I'm doing something wrong, but I can't  understand how to troubleshoot this situation. Could you help me with some advice? 

Additional question is -- what is a recommended way to express this node-heterogeneous configuration using mpirun/mpiexec runtime interfaces, if we decide to avoid srun interface for this job?

Hardware: MareNostrum 5 General Purpose Partition (https://www.bsc.es/marenostrum/marenostrum-5)
Software: module load intel/2023.2.0 impi/2021.10.0

--Alexey

0 Kudos
5 Replies
TobiasK
Moderator
752 Views

@amedvedev 

 

You are using MPMD launch, not a heterogenous launch. (Heterogenous would refer to different HW configurations).

https://www.intel.com/content/www/us/en/docs/mpi-library/developer-guide-linux/2021-13/mpmd-launch-mode.html

Since your question is tightly connected to the SLURM configuration, I cannot help much here. Please connect with the BSC staff first.
BSC should have a priority support contract which they can use to escalate this topic in the appropriate channels.

You may try to use mpirun with different -ppn settings as described above and setting 

export I_MPI_PIN_RESPECT_CPUSET=0 
0 Kudos
amedvedev
Beginner
743 Views

@TobiasK 

You may have noticed that I've put the word "heterogeneous" in quotation marks -- I'm aware that this term has a specific meaning in MPI. Here I just wanted to denote that the execution topology that I want has certain level of heterogeneity.

>> Please connect with the BSC staff first.

Thanks, we obviously did this before. We'll try to ask them to use special support channels to resolve this.

I'll also try out the MPMD launch option, thanks.

--Alexey

0 Kudos
TobiasK
Moderator
709 Views

For sure it's not obvious that you reached out to the support staff at BSC. This is not a support forum. Your problem is out of scope for this forum, sorry.

0 Kudos
amedvedev
Beginner
689 Views

Wow. I just asked for some advice on possible ways to debug and troubleshoot, nothing more. Surprised

0 Kudos
amedvedev
Beginner
491 Views

For the record, we managed to switch to the `mpirun -configfile` notation from the `srun` notation in our scripts. Most difficult was to find `I_MPI_PIN_...` combination that works the same way as our `srun` line.

We may mark this resolved. 

0 Kudos
Reply