I'm running FMS application (http://www.gfdl.noaa.gov/fms) and some of the runs fail with the following error -
[proxy:0:1@n04] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:70): assert (!(pollfds.revents & ~POLLIN & ~POLLOUT & ~POLLHUP)) failed
[proxy:0:1@n04] main (./pm/pmiserv/pmip.c:387): demux engine error waiting for event
[mpiexec@n01] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:101): one of the processes terminated badly; aborting
[mpiexec@n01] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:18): bootstrap device returned error waiting for completion
[mpiexec@n01] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:521): bootstrap server returned error waiting for completion
[mpiexec@n01] main (./ui/mpich/mpiexec.c:548): process manager error waiting for completion
set date_name = `$time_stamp -eh
Please note that some of the runs are successful so i'm aware that this might not be MPI issue. setting I_MPI_DEBUG to 3 do not provide additional useful information. any idea how i can find the reason for this failure? any debug tips? some env parameters that might help?
I tried running with I_MPI_FABRICS "shm:tcp", same result.
thx in advance!
You also submitted this issue to Intel® Premier Support, and it is being handled there. I'm noting this for others who see this thread.
Technical Consulting Engineer
Intel® Cluster Tools