- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hi Intel community,
I am using IntelMPI 2019.8 with slurm. I have noticed that when running with a machinefile, it does not follow the assigned nodes exactly. For example, all the processes assigned to node1 are all assigned to node2, and all the processes assigned to node2 are assigned to another node. How do we make it follow the machinefile exactly? I am attaching the sample program we are running to test the machinefile along with the slurm script.
Thanks,
Erica
Link kopiert
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hi Erica,
Thanks for reporting this to us.
We have observed similar behaviour in SLURM. The process placement is accurate for other job schedulers (we have checked for PBS).
So, we are transferring this to our internal team for better support.
Regards
Prasanth
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
When I tested with 2019 Update 8 on an internal cluster, I am seeing the expected behavior. Can you please send the full output with I_MPI_DEBUG=16?
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hi James,
Here is the output with corresponding machinefile.
Thanks,
Erica
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hi James,
Do you know why it differs between your run internally and our run? Is there any setting we're missing for our run?
Thanks,
Erica
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hi James,
Could you share your slurm job script with us so we can test it? Which version of slurm did you test it on?
Thanks,
Erica
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
I apologize for dropping this. Here is the script I used for testing. I randomized the order of hosts in order to ensure that the machinefile is being used rather than the SLURM nodelist. Tested on a customized version of SLURM 20.11.7. The output matches the order in the machinefile.
#!/bin/bash
#SBATCH -N 8
scontrol show hostnames $SLURM_JOB_NODELIST | shuf > machinefile.txt
scontrol show hostnames $SLURM_JOB_NODELIST | shuf >> machinefile.txt
source /opt/intel/oneAPI/latest/setvars.sh
mpirun -n 16 -machinefile machinefile.txt -genv I_MPI_DEBUG 3 -bootstrap ssh ./a.out
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
I am closing the Intel support case related to this thread. Everything appears to be functioning as expected in multiple test scenarios. Any further replies on this thread will be considered community only. If you require additional support assistance on this issue, please start a new thread with current details and logs.

- RSS-Feed abonnieren
- Thema als neu kennzeichnen
- Thema als gelesen kennzeichnen
- Diesen Thema für aktuellen Benutzer floaten
- Lesezeichen
- Abonnieren
- Drucker-Anzeigeseite