Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

MPI Crash (Hydra) in FMS

Gilad_Berman
Beginner
1,123 Views

Hello,

I'm running FMS application (http://www.gfdl.noaa.gov/fms) and some of the runs fail with the following error -

[proxy:0:1@n04] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:70): assert (!(pollfds.revents & ~POLLIN & ~POLLOUT & ~POLLHUP)) failed
[proxy:0:1@n04] main (./pm/pmiserv/pmip.c:387): demux engine error waiting for event
[mpiexec@n01] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:101): one of the processes terminated badly; aborting
[mpiexec@n01] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:18): bootstrap device returned error waiting for completion
[mpiexec@n01] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:521): bootstrap server returned error waiting for completion
[mpiexec@n01] main (./ui/mpich/mpiexec.c:548): process manager error waiting for completion
set date_name = `$time_stamp -eh

Please note that some of the runs are successful so i'm aware that this might not be MPI issue. setting I_MPI_DEBUG to 3 do not provide additional useful information. any idea how i can find the reason for this failure? any debug tips? some env parameters that might help?
I tried running with I_MPI_FABRICS "shm:tcp", same result.  

thx in advance!  

0 Kudos
1 Reply
James_T_Intel
Moderator
1,123 Views
Hi Gilad,

You also submitted this issue to Intel® Premier Support, and it is being handled there.  I'm noting this for others who see this thread.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
0 Kudos
Reply