- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
I have just installed intel-mpi 4.1.2.040 onf a cluster...
If I used mpiexec.hydra to start jobs one per node... it still spawns processes on all available resources...
mpiexec.hydra -ppn 1 hostname
on two nodes will show me 40 lines as oppose to only two expected.
I have added a file with debug info when running
I_MPI_HYDRA_DEBUG=1 mpiexec.hydra -ppn 1 hostname 2>&1 | tee debug.txt
regards,
Alin
Link kopiert
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Forgot to say! Any help in solving the issue or better understanding it much appreciated.
regards,
Alin
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hi Alin,
Using -ppn will not limit the total number of ranks on a host, simply the number of consecutive ranks on each host. If you have too many ranks, the placement will cycle back to the first host and begin again. So if I have a hostfile with two hosts (node0 and node1), here's what I should see:
[plain]$mpirun -n 4 -ppn 2 ./hello
Hello world: rank 0 of 4 running on node0
Hello world: rank 1 of 4 running on node0
Hello world: rank 2 of 4 running on node1
Hello world: rank 3 of 4 running on node1
$mpirun -n 4 -ppn 1 ./hello
Hello world: rank 0 of 4 running on node0
Hello world: rank 1 of 4 running on node1
Hello world: rank 2 of 4 running on node0
Hello world: rank 3 of 4 running on node1[/plain]
In your command line, you didn't specify the number of ranks to run. If you don't specify that number, it will be determined from your job (or if that can't be found, then the number of cores available on the host). In this case, your job says to use 40 ranks, so 40 ranks were launched.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hi Alin,
What is the full version number for the working one?
James.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hi James,
Alin not being available right now, I'll answer the question.
The Intel MPI version the working mpiexec.hydra comes from is 4.1.0.024.
More precisely, it says: Intel(R) MPI Library for Linux* OS, Version 4.1.0 Build 20120831
Cheers.
Gilles
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hi,
Any chance to see an update on this issue? From the outside it looks so trivially a regression in the mpiexec.hydra, yet it is so annoying from a user's point of view... Do I miss some critical element here?
Although using an old version of it allows to run, it might have some unexpected side effects we don't see. Moreover, since we plan using intensively symmetric MPI mode on Xeon phi, being in a clean and up-to-date Intel MPI environment would be a highly desirable.
Cheers.
Gilles
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hi Gilles,
I currently do not have any additional information about this issue. Several other customers are reporting it. I can suggest using a machinefile as a workaround, or specifying a different hostfile, rather than allowing Hydra to automatically get the hosts from your job scheduler.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
I'm working in a cluster and learning how to send different process. Today I tried to use a script with the command to execute the program. Suddenly, when I use the command top appears:
28210 jazmin 25 0 13088 928 712 R 100.2 0.0 383:56.27 mpiexec.hydra
and I cannot kill this process, how can I do it? thanks in advance

- RSS-Feed abonnieren
- Thema als neu kennzeichnen
- Thema als gelesen kennzeichnen
- Diesen Thema für aktuellen Benutzer floaten
- Lesezeichen
- Abonnieren
- Drucker-Anzeigeseite