I have just installed intel-mpi 4.1.2.040 onf a cluster...
If I used mpiexec.hydra to start jobs one per node... it still spawns processes on all available resources...
mpiexec.hydra -ppn 1 hostname
on two nodes will show me 40 lines as oppose to only two expected.
I have added a file with debug info when running
I_MPI_HYDRA_DEBUG=1 mpiexec.hydra -ppn 1 hostname 2>&1 | tee debug.txt
regards,
Alin
連結已複製
Hi Alin,
Using -ppn will not limit the total number of ranks on a host, simply the number of consecutive ranks on each host. If you have too many ranks, the placement will cycle back to the first host and begin again. So if I have a hostfile with two hosts (node0 and node1), here's what I should see:
[plain]$mpirun -n 4 -ppn 2 ./hello
Hello world: rank 0 of 4 running on node0
Hello world: rank 1 of 4 running on node0
Hello world: rank 2 of 4 running on node1
Hello world: rank 3 of 4 running on node1
$mpirun -n 4 -ppn 1 ./hello
Hello world: rank 0 of 4 running on node0
Hello world: rank 1 of 4 running on node1
Hello world: rank 2 of 4 running on node0
Hello world: rank 3 of 4 running on node1[/plain]
In your command line, you didn't specify the number of ranks to run. If you don't specify that number, it will be determined from your job (or if that can't be found, then the number of cores available on the host). In this case, your job says to use 40 ranks, so 40 ranks were launched.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
Hi James,
Alin not being available right now, I'll answer the question.
The Intel MPI version the working mpiexec.hydra comes from is 4.1.0.024.
More precisely, it says: Intel(R) MPI Library for Linux* OS, Version 4.1.0 Build 20120831
Cheers.
Gilles
Hi,
Any chance to see an update on this issue? From the outside it looks so trivially a regression in the mpiexec.hydra, yet it is so annoying from a user's point of view... Do I miss some critical element here?
Although using an old version of it allows to run, it might have some unexpected side effects we don't see. Moreover, since we plan using intensively symmetric MPI mode on Xeon phi, being in a clean and up-to-date Intel MPI environment would be a highly desirable.
Cheers.
Gilles
Hi Gilles,
I currently do not have any additional information about this issue. Several other customers are reporting it. I can suggest using a machinefile as a workaround, or specifying a different hostfile, rather than allowing Hydra to automatically get the hosts from your job scheduler.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
I'm working in a cluster and learning how to send different process. Today I tried to use a script with the command to execute the program. Suddenly, when I use the command top appears:
28210 jazmin 25 0 13088 928 712 R 100.2 0.0 383:56.27 mpiexec.hydra
and I cannot kill this process, how can I do it? thanks in advance