- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have installed under CentoOS 6.5 (latest release), the latest as of this writing intel mpi system and compilers l_mpi_p_4.1.3.049 and parallel_studio_xe_2013_sp1_update3. This is on a Dell T620 system with an 24 cores (Ivy Bridge 12 cores x 2 Cpus). I have four of these nodes and I am not having the same sort of trouble for the other nodes as this one. I have reproduced the trouble below, namely when I attempt to start a process using mpi hydra, it always hangs with the error "pmi_proxy: No such file or directory". On the other hand if I use the mod system, the program (in this case the ab-initio calculation software VASP) starts up and runs without trouble. I have reinstalled both the mpi and compiler systems and I am have no idea what is causing this problem. Another symptom is that trying a simple diagnostic such as "mpirun -n 24 hostname" and mpiexec -n 24 hostname" produce different results. While mpirun results in the same hang with pmi_proxy, mpiexec runs fine (reproduced below). On the other nodes, "mpirun -n 24 hostname" prints out the hostnames as expected.
Any suggestions as to how to fix this would be gratefully appreciated.
Paul Fons
Output relating to the failure of hydra to run.
matstud@draco.a04.aist.go.jp:>source /opt/intel/bin/compilervars.sh intel64
matstud@draco.a04.aist.go.jp:>source /opt/intel/impi/4.1.3/intel64/bin/mpivars.sh
matstud@draco.a04.aist.go.jp:>mpdallexit
matstud@draco.a04.aist.go.jp:>mpiexec.hydra -n 24 -env I_MPI_FABRICS shm:ofa vasp
bash: /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy: No such file or directory
bash: /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy: No such file or directory
^CCtrl-C caught... cleaning up processes
[mpiexec@draco.a04.aist.go.jp] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:239): assert (!closed) failed
[mpiexec@draco.a04.aist.go.jp] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:127): unable to send SIGUSR1 downstream
[mpiexec@draco.a04.aist.go.jp] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@draco.a04.aist.go.jp] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event
[mpiexec@draco.a04.aist.go.jp] main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion
matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>ls -l /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy
-rwxr-xr-x 1 root root 1001113 Mar 3 17:51 /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy
matstud@draco.a04.aist.go.jp:>mpdboot
matstud@draco.a04.aist.go.jp:>mpiexec -n 24 vasp
running on 24 total cores
distrk: each k-point on 24 cores, 1 groups
distr: one band on 1 cores, 24 groups
using from now: INCAR
vasp.5.3.5 31Mar14 (build Apr 04 2014 15:18:05) complex
POSCAR found : 2 types and 128 ions
scaLAPACK will be used
Output showing the different behavior of mpirun and mpiexec
matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>mpirun -n 24 hostname
bash: /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy: No such file or directory
bash: /opt/intel/impi/4.1.3.049/intel64/bin/pmi_proxy: No such file or directory
^CCtrl-C caught... cleaning up processes
[mpiexec@draco.a04.aist.go.jp] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:239): assert (!closed) failed
[mpiexec@draco.a04.aist.go.jp] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:127): unable to send SIGUSR1 downstream
[mpiexec@draco.a04.aist.go.jp] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@draco.a04.aist.go.jp] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event
[mpiexec@draco.a04.aist.go.jp] main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion
matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>mpiexec -n 24 hostname
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
draco.a04.aist.go.jp
matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>which mpirun
/opt/intel/impi/4.1.3.049/intel64/bin/mpirun
matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>which mpiexec
/opt/intel/impi/4.1.3.049/intel64/bin/mpiexec
matstud@draco.a04.aist.go.jp:/usr/local/share/Vasp/Fons/GeTe/Phonons/phonopy/bulk/GeTe_ferwe/nelect.576-SPOSCAR>
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is the output from
env | grep I_MPI
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The result from "env | grep I_MPI" is the same as that from the other machines in the cluster that do not have this problem:
I_MPI_FABRICS=shm:ofa
I_MPI_ROOT=/opt/intel/impi/4.1.3.049
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try running with I_MPI_HYDRA_DEBUG=1 and attach the output.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi i want to run mic executable from host i followed the procedure in the doc from intel site but got following errors
i have tried your procedure to run mic executa
[kiran@compute012 mpi_program]$ mpirun -f mpi_host -n 4 ./hello_mic
pmi_proxy: line 0: exec: pmi_proxy: not found
Ctrl-C caught... cleaning up processes
[mpiexec@compute012] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:239): assert (!closed) failed
[mpiexec@compute012] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:127): unable to send SIGUSR1 downstream
[mpiexec@compute012] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec@compute012] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event
[mpiexec@compute012] main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion
[kiran@compute012 mpi_program]$ cat mpi_host
compute012-mic0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you have /opt/intel available via NFS on the coprocessor? If not, you will need to ensure that pmi_proxy (along with whichever MPI libraries you have linked) is available in the path on the coprocessor.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found a fix!
This page didn't really help me, but I think I found a solution so I thought I'd post it.
I found that setting the I_MPI_MIC_PROXY_PATH environment variable to the directory in which the pmi_proxy command for the MIC resides (on the MIC itself) corrects this issue!
HTH,
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Đầu ra là gì
env | grep I_MPI
[/ quote]
cảm ơn bạn đã giải quyết vấn đề của tôi thanhlycuongphat.com

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page