- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I run with IntelMPI with a forked process ( & ), it leaves mpiexe.hydra:
30545 pts/25 00:00:00 Job1
30546 pts/25 00:00:00 Job2
30646 pts/25 00:00:00 mpirun
30651 pts/25 00:00:00 mpiexec.hydra <defunct>
Details:
IntelMPI 5.1.2.150
JOB1 is run with "&", which creates JOB2 which runs mpirun with "-configfile"
If I_MPI_PROCESS_MANAGER=mpd is used, mpiexec.hydra is not left.
If JOB1 is run without "&", mpiexec.hydra is not left.
If I set "-v" I see at the end:
[proxy:0:0@sudev604] got pmi command (from 10): finalize
[proxy:0:0@sudev604] PMI response: cmd=finalize_ack
[proxy:0:0@sudev604] got pmi command (from 12): finalize
[proxy:0:0@sudev604] PMI response: cmd=finalize_ack
My results are fine, the issue is that mpiexe.hydra is left.
I have not been able to find this issue anyplace else.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is the output from "ps -fu"
# ps -fu
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
jjg 24394 0.0 0.0 124520 2880 pts/25 Ss 08:45 0:00 -csh
jjg 28780 0.0 0.0 113116 1392 pts/25 T 11:31 0:00 \_ /bin/sh JOB1
jjg 28781 0.1 0.0 13396 1876 pts/25 T 11:31 0:00 | \_ /bin/ksh /home/jjg/JOB2
jjg 28879 0.0 0.0 9516 1432 pts/25 T 11:31 0:00 | \_ /bin/sh Path_to_intel/intel/bin64/mpirun -v -pmi-connect nocache -print-al
jjg 28884 0.0 0.0 0 0 pts/25 Z 11:31 0:00 | \_ [mpiexec.hydra] <defunct>
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
So far I cannot reproduce the behavior you mentioned, using the most recent Intel MPI 2017.
Please provide further details.
Best regards,
Michael
$ cat job1.sh
#!/bin/bash
./job2.sh
$ cat job2.sh
#!/bin/bash
mpirun -configfile ./configfile
$ cat configfile
-n 10 -host ewb277 ./test.x
-n 10 -host ewb278 ./test.x
$ ./job1.sh & sleep 2 && ps -ux
[1] 189266
Hello world: rank 0 of 20 running on ewb277
...
Hello world: rank 10 of 20 running on ewb278
...
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
...
msteyer 189266 0.0 0.0 113152 1288 pts/0 S 06:51 0:00 /bin/bash ./job1.sh
msteyer 189268 0.0 0.0 113152 1300 pts/0 S 06:51 0:00 /bin/bash ./job2.sh
msteyer 189269 0.0 0.0 113152 1372 pts/0 S 06:51 0:00 /bin/sh /opt/intel/impi/2017.0.098/compilers_and_libraries_2017.0.098/linux/mpi/intel64/bin/mpirun -configfi
msteyer 189274 0.5 0.0 17904 1624 pts/0 S 06:51 0:00 mpiexec.hydra -configfile ./configfile
msteyer 189275 1.5 0.0 17212 1924 pts/0 S 06:51 0:00 /opt/intel/impi/2017.0.098/compilers_and_libraries_2017.0.098/linux/mpi/intel64/bin/pmi_proxy --control-port
msteyer 189276 1.5 0.0 76864 3856 pts/0 S 06:51 0:00 /bin/ssh -x -q ewb278 /opt/intel/impi/2017.0.098/compilers_and_libraries_2017.0.098/linux/mpi/intel64/bin/pm
msteyer 189280 19.5 0.0 185348 34000 pts/0 Rl 06:51 0:00 ./test.x
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Somewhat old, but it has happened to me when I_MPI_HYDRA_CLEANUP was on. After unset it, the child defunct disappeared.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page