Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2161 Discussions

MPI run crashes on more than one node

arash_m_
Beginner
1,099 Views

Hi everyone,

I'm using MPICH2 v1.5 to run my WRF model on INTEL Xeon Processors. I can run on one node with as many cores as I want but if it exceeds the number of porcessors in a core it will crash with following error:

*********************************************************************************************************************************************

[proxy:0:0@hpc1934] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:883): assert (!closed) failed
[proxy:0:0@hpc1934] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0@hpc1934] main (./pm/pmiserv/pmip.c:210): demux engine error waiting for event
[proxy:0:7@hpc1945] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:883): assert (!closed) failed
[proxy:0:7@hpc1945] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:7@hpc1945] main (./pm/pmiserv/pmip.c:210): demux engine error waiting for event
[proxy:0:5@hpc1940] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:883): assert (!closed) failed
[proxy:0:5@hpc1940] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:5@hpc1940] main (./pm/pmiserv/pmip.c:210): demux engine error waiting for event
[mpiexec@hpc1934] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@hpc1934] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@hpc1934] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for completion
[mpiexec@hpc1934] main (./ui/mpich/mpiexec.c:325): process manager error waiting for completion

**********************************************************************************************************************************************

and this is how I run the model:

$ulimit -s unlimited

$source ~/setup-intel.sh

$mpiexec -np nproc ./wrf.exe >& benchmark#n.log

I would appreciate any help in this regard.

Bests,

Arash

0 Kudos
0 Replies
Reply