MPI_Finalize Error Present with mpiicpc.

Melissa · ‎09-19-2014

I have been having trouble with the intel-compiled version of a scientific software stack.

The stack uses both OpenMP and MPI. When I started working on the code, it had been compiled with gcc & a gcc-compiled OpenMPI. Prior to adding any MPI code, the software compiles with icpc and runs without error.

The versions I am working with are: Intel compiler version 14.0.2, Intel mkl 11.1.2, and Intel MPI 4.1.3. I have tried turning up the debug level I_MPI_DEBUG to get more informative messages, but what I always end up with is:

Could not find **ibv_destroy_qp in list of messages 
Could not find **ibv_destroy_qp in list of messages 
Could not find **ibv_destroy_qp in list of messages 
Could not find **ibv_destroy_qp in list of messages 
Could not find **ibv_destroy_qp in list of messages 
Could not find **vc_gen2_qp_finalize in list of messages 
Could not find **ibv_destroy_qp in list of messages 
Could not find **vc_gen2_qp_finalize in list of messages 
Could not find **vc_gen2_qp_finalize in list of messages 
Could not find **vc_gen2_qp_finalize in list of messages 
Could not find **vc_gen2_qp_finalize in list of messages 
Could not find **vc_gen2_qp_finalize in list of messages 
Fatal error in MPI_Finalize: 
Internal MPI error!, error stack: 
MPI_Finalize(311).................: 
MPI_Finalize failed 
MPI_Finalize(229).................: 
MPID_Finalize(140)................: 
MPIDI_CH3_Finalize(24)............: 
MPID_nem_finalize(63).............: 
MPID_nem_gen2_module_finalize(520):(unknown)(): Internal MPI error!

Someone suggested that this could be a strange optimization on the part of icpc, so I am now compiling with -O0 (and also running with -check-mpi). Someone also suggested that MPI_Finalize is being called before all of the messages return, so I tried having the program sleep for a number of minutes before calling MPI_Finalize in case it was a race condition of some sort.

The project is fairly large, however, only 2 main files call MPI_Finalize. Without -O0, the program fails when the first file calls MPI_Finalize. However, after -O0, the program runs all the way until this second call to MPI_Finalize (in file 2). But the error is the same.

I also thought some memory was being freed that shouldn't have before MPI_Finalize, however, there is an MPI_Barrier just a few lines of code above the finalize, which seems to indicate to me that all of the processes successfully reach the Barrier call. After the barrier call, some strings are printed, nothing is done with freeing or deallocating data structures, and then the MPI_Finalize is called.

In addition to the MPI_Finalize error above, I also only OCCASIONALLY get the following error (i.e., I get the MPI_Finalize error every time, but sometimes I get this error in addition to the MPI_Finalize error):

[mpiexec@ida3c03] control_cb (./pm/pmiserv/pmiserv_cb.c:717): assert (!closed) failed 
[mpiexec@ida3c03] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status 
[mpiexec@ida3c03] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event 
[mpiexec@ida3c03] main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion

have tried everything I can think of and am not sure where to go from here in terms of debugging.

James_T_Intel · ‎09-19-2014

This could be an InfiniBand* problem. Here are a few suggestions to try. Set I_MPI_FABRICS=tcp to force using sockets instead of DAPL*. What DAPL* provider are you using? Or are you using OFED directly? This should appear in the beginning of the debug output (fabric selection). Let's start with this and see what can be found.

Melissa · ‎09-19-2014

Initially, I was getting libdat2 errors, which I thought were causing the MPI_Finalize errors (similar to https://software.intel.com/en-us/forums/topic/288990). I contacted the machine admin (I don't have root access), and he arrived at the conclusion that they are using OFED directly (no DAPL). I changed the runs to be shm:ofa, but I still get this error. Additionally, I did try a tcp run. This did not fail, but it was so incredibly slow that it exceeded the maximum machine wall clock time for the queues that I have access to (24hrs). It never even reached the first MPI_Finalize part of the code, so it was hard to conclude anything concrete. I am composing a smaller dataset for the code to test the tcp method further now and will get back to you. Thank you for your quick reply.