Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
2276 Discussions

MPI run crashes on more than one node

arash_m_
Beginner
1,628 Views

Hi everyone,

I'm using MPICH2 v1.5 to run my WRF model on INTEL Xeon Processors. I can run on one node with as many cores as I want but if it exceeds the number of porcessors in a core it will crash with following error:

*********************************************************************************************************************************************

[proxy:0:0@hpc1934] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:883): assert (!closed) failed
[proxy:0:0@hpc1934] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0@hpc1934] main (./pm/pmiserv/pmip.c:210): demux engine error waiting for event
[proxy:0:7@hpc1945] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:883): assert (!closed) failed
[proxy:0:7@hpc1945] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:7@hpc1945] main (./pm/pmiserv/pmip.c:210): demux engine error waiting for event
[proxy:0:5@hpc1940] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:883): assert (!closed) failed
[proxy:0:5@hpc1940] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:5@hpc1940] main (./pm/pmiserv/pmip.c:210): demux engine error waiting for event
[mpiexec@hpc1934] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@hpc1934] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@hpc1934] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:216): launcher returned error waiting for completion
[mpiexec@hpc1934] main (./ui/mpich/mpiexec.c:325): process manager error waiting for completion

**********************************************************************************************************************************************

and this is how I run the model:

$ulimit -s unlimited

$source ~/setup-intel.sh

$mpiexec -np nproc ./wrf.exe >& benchmark#n.log

I would appreciate any help in this regard.

Bests,

Arash

0 Kudos
0 Replies
Reply