Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

mpiexe.hydra is defunct

joe-griffin
Beginner
986 Views

If I run with IntelMPI with a forked process ( & ), it leaves mpiexe.hydra:

30545 pts/25   00:00:00 Job1
30546 pts/25   00:00:00 Job2
30646 pts/25   00:00:00 mpirun
30651 pts/25   00:00:00 mpiexec.hydra <defunct>

Details:
IntelMPI 5.1.2.150
JOB1 is run with "&", which creates JOB2 which runs mpirun with "-configfile"
If I_MPI_PROCESS_MANAGER=mpd is used, mpiexec.hydra is not left.
If JOB1 is run without "&", mpiexec.hydra is not left.
If I set "-v" I see at the end:

[proxy:0:0@sudev604] got pmi command (from 10): finalize
[proxy:0:0@sudev604] PMI response: cmd=finalize_ack
[proxy:0:0@sudev604] got pmi command (from 12): finalize
[proxy:0:0@sudev604] PMI response: cmd=finalize_ack


My results are fine, the issue is that mpiexe.hydra is left.

I have not been able to find this issue anyplace else.

 

 

 

0 Kudos
3 Replies
joe-griffin
Beginner
986 Views

Here is the output from "ps -fu"

# ps -fu

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
jjg      24394  0.0  0.0 124520  2880 pts/25   Ss   08:45   0:00 -csh
jjg      28780  0.0  0.0 113116  1392 pts/25   T    11:31   0:00  \_ /bin/sh JOB1
jjg      28781  0.1  0.0  13396  1876 pts/25   T    11:31   0:00  |   \_ /bin/ksh /home/jjg/JOB2
jjg      28879  0.0  0.0   9516  1432 pts/25   T    11:31   0:00  |       \_ /bin/sh Path_to_intel/intel/bin64/mpirun -v -pmi-connect nocache -print-al
jjg      28884  0.0  0.0      0     0 pts/25   Z    11:31   0:00  |           \_ [mpiexec.hydra] <defunct>
 

 

0 Kudos
Michael_Intel
Moderator
986 Views

Hello,

So far I cannot reproduce the behavior you mentioned, using the most recent Intel MPI 2017.
Please provide further details.

Best regards,
Michael

$ cat job1.sh
#!/bin/bash
./job2.sh
$ cat job2.sh
#!/bin/bash
mpirun -configfile ./configfile
$ cat configfile
-n 10 -host ewb277 ./test.x
-n 10 -host ewb278 ./test.x

$ ./job1.sh & sleep 2 && ps -ux
[1] 189266
Hello world: rank 0 of 20 running on ewb277
...
Hello world: rank 10 of 20 running on ewb278
...
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
...
msteyer  189266  0.0  0.0 113152  1288 pts/0    S    06:51   0:00 /bin/bash ./job1.sh
msteyer  189268  0.0  0.0 113152  1300 pts/0    S    06:51   0:00 /bin/bash ./job2.sh
msteyer  189269  0.0  0.0 113152  1372 pts/0    S    06:51   0:00 /bin/sh /opt/intel/impi/2017.0.098/compilers_and_libraries_2017.0.098/linux/mpi/intel64/bin/mpirun -configfi
msteyer  189274  0.5  0.0  17904  1624 pts/0    S    06:51   0:00 mpiexec.hydra -configfile ./configfile
msteyer  189275  1.5  0.0  17212  1924 pts/0    S    06:51   0:00 /opt/intel/impi/2017.0.098/compilers_and_libraries_2017.0.098/linux/mpi/intel64/bin/pmi_proxy --control-port
msteyer  189276  1.5  0.0  76864  3856 pts/0    S    06:51   0:00 /bin/ssh -x -q ewb278 /opt/intel/impi/2017.0.098/compilers_and_libraries_2017.0.098/linux/mpi/intel64/bin/pm
msteyer  189280 19.5  0.0 185348 34000 pts/0    Rl   06:51   0:00 ./test.x

0 Kudos
Mota__Cciero
Beginner
986 Views

Somewhat old, but it has happened to me when I_MPI_HYDRA_CLEANUP was on. After unset it, the child defunct disappeared.

0 Kudos
Reply