- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I'm facing an inability to run the simple but "heterogeneous" MPI configurations on BSC MareNostrum5 machine (https://www.bsc.es/marenostrum/marenostrum-5). I'd like to understand how to diagnose and debug this in order to make this configuration work.
I'm running a simple hello-world MPI Fortran program on 2 nodes. For each of the nodes, I'd like to have different number of ranks per node. I'm using SLURM interface to do so in such a way:
```
srun --exclusive --cpus-per-task=4 --ntasks-per-node=28 --hint=nomultithread -N 1 -n 28 ./mpi_c : --exclusive --cpus-per-task=8 --ntasks-per-node=14 --hint=nomultithread -N 1 -n 14 ./mpi_c
```
So I'm setting up 28ranks/4threads configuration for the first node, and 14ranks/8threads one for the second node.
This runs as expected when I start execution directly by invoking srun, but when I first do sbatch allocation, and srun after that, the MPI program hangs in the MPI_Init:
```
run.sh:
sbatch --exclusive --account=XXX --qos=XXX --time=1 -N 2 ./srun.sh
srun.sh:
srun --exclusive --cpus-per-task=4 --ntasks-per-node=28 --hint=nomultithread -N 1 -n 28 ./mpi_c : --exclusive --cpus-per-task=8 --ntasks-per-node=14 --hint=nomultithread -N 1 -n 14 ./mpi_c
```
```
(gdb) bt
#0 0x00007effcdc32872 in read () from /lib64/libc.so.6
#1 0x00007effcdaa3fdb in read (__nbytes=6, __buf=0x7ffe9e7f0b39, __fd=53)
at /usr/include/bits/unistd.h:38
#2 PMIi_ReadCommand (cmd=0x7ffe9e7f23c0, fd=<optimized out>) at pmi2_api.c:1401
#3 PMIi_ReadCommandExp (fd=<optimized out>, cmd=cmd@entry=0x7ffe9e7f23c0,
exp=exp@entry=0x7effcdaa8610 <KVSFENCERESP_CMD> "kvs-fence-response",
rc=rc@entry=0x7ffe9e7f23bc, errmsg=errmsg@entry=0x7ffe9e7f23b0) at pmi2_api.c:1539
#4 0x00007effcdaa5d87 in PMI2_KVS_Fence () at pmi2_api.c:748
#5 0x00007effce87c6b0 in PMI2_KVS_Fence () at ../../src/pmi/intel/pmi2_virtualization.c:185
#6 0x00007effce7989a6 in MPIR_pmi_barrier () at ../../src/util/mpir_pmi.c:507
#7 0x00007effce79bc08 in optional_bcast_barrier (domain=MPIR_PMI_DOMAIN_LOCAL)
at ../../src/util/mpir_pmi.c:1286
#8 0x00007effce79c0bd in MPIR_pmi_bcast (buf=0x7f20001f5c00, bufsize=1024,
domain=MPIR_PMI_DOMAIN_LOCAL) at ../../src/util/mpir_pmi.c:1352
#9 0x00007effce694d40 in MPIDU_Init_shm_init () at ../../src/mpid/common/shm/mpidu_init_shm.c:174
#10 0x00007effce235a46 in MPID_Init (requested=0, provided=0x7effd06a3700 <MPIR_ThreadInfo>)
at ../../src/mpid/ch4/src/ch4_init.c:1525
#11 0x00007effce5d4881 in MPIR_Init_thread (argc=0x0, argv=0x0, user_required=0,
provided=0x7ffe9e7f300c) at ../../src/mpi/init/initthread.c:175
#12 0x00007effce5d4151 in PMPI_Init (argc=0x0, argv=0x0) at ../../src/mpi/init/init.c:139
#13 0x00007effd0d8448b in pmpi_init_ (ierr=0x35) at ../../src/binding/fortran/mpif_h/initf.c:274
#14 0x0000000000404299 in MAIN__ ()
#15 0x000000000040420d in main ()
(gdb)
```
I suspect I'm doing something wrong, but I can't understand how to troubleshoot this situation. Could you help me with some advice?
Additional question is -- what is a recommended way to express this node-heterogeneous configuration using mpirun/mpiexec runtime interfaces, if we decide to avoid srun interface for this job?
Hardware: MareNostrum 5 General Purpose Partition (https://www.bsc.es/marenostrum/marenostrum-5)
Software: module load intel/2023.2.0 impi/2021.10.0
--Alexey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are using MPMD launch, not a heterogenous launch. (Heterogenous would refer to different HW configurations).
https://www.intel.com/content/www/us/en/docs/mpi-library/developer-guide-linux/2021-13/mpmd-launch-mode.html
Since your question is tightly connected to the SLURM configuration, I cannot help much here. Please connect with the BSC staff first.
BSC should have a priority support contract which they can use to escalate this topic in the appropriate channels.
You may try to use mpirun with different -ppn settings as described above and setting
export I_MPI_PIN_RESPECT_CPUSET=0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@TobiasK
You may have noticed that I've put the word "heterogeneous" in quotation marks -- I'm aware that this term has a specific meaning in MPI. Here I just wanted to denote that the execution topology that I want has certain level of heterogeneity.
>> Please connect with the BSC staff first.
Thanks, we obviously did this before. We'll try to ask them to use special support channels to resolve this.
I'll also try out the MPMD launch option, thanks.
--Alexey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For sure it's not obvious that you reached out to the support staff at BSC. This is not a support forum. Your problem is out of scope for this forum, sorry.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Wow. I just asked for some advice on possible ways to debug and troubleshoot, nothing more. Surprised
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For the record, we managed to switch to the `mpirun -configfile` notation from the `srun` notation in our scripts. Most difficult was to find `I_MPI_PIN_...` combination that works the same way as our `srun` line.
We may mark this resolved.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page