Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Adrian_I_
Beginner
70 Views

MPI_Request_free() hangs

Hi,

Trying to make use of the MPI persistent communication primitives in our application, I'm ending up with the following sequence of events:

MPI_Ssend_init(msg, msg_length, MPI_BYTE, 0, tag, comm, &req);
MPI_Start(&req);
MPI_Cancel(&req);
MPI_Wait(&req, MPI_STATUS_IGNORE);
MPI_Request_free(&req); // <-- HANGS

The only other node is blocked in an MPI_Barrier(comm);

I noticed that if I comment out the MPI_Barrier() call and let the other node proceed to freeing the communicator and then enter some other MPI_Barrier() on a different communicator, then the MPI_Request_free() call magically returns.

I tried reproducing this in a separate test program, but everything works as expected there. So I understand that there is probably some (possibly unrelated) bug in my original application that causes this behaviour and that one would need more information in order to figure this out.

But what puzzles me is that MPI_Request_free() blocks, even though the standard says that it is supposed to be a local operation (i.e. its completion should not depend on any other nodes).

So my main questions are:

  1. What can MPI_Request_free() possibly be waiting for?
  2. Any ideas/suggestions for how I can best debug this sort of issues?

Thanks in advance!
- Adrian

 

 

0 Kudos
3 Replies
James_T_Intel
Moderator
70 Views

Adrian,

I'm not able to easily reproduce the hang.  I don't think you should be encountering a hang here.  Can you run with the message checking library in Intel® Trace Analyzer and Collector?  That should show if there is a problem in your usage of MPI.  Also, can you send me your code showing the hang?  Private message is fine.

James.

Adrian_I_
Beginner
70 Views

Thanks for the quick reply, James!

I'm not allowed to share the source code of our full application, unfortunately, and we don't have a license for ITAC.

But I grabbed an Evaluation of Intel Parallel Studio 2016 and noticed that with the included Intel MPI 5.1.1 our application (same binaries, didn't even recompile) runs successfully and mpirun --check-mpi does not find any issues at all.

Could it be a bug in the library itself that got fixed in the meantime?

James_T_Intel
Moderator
70 Views

Possibly.  I don't immediately see anything in our bug tracking that would explain this, but it could have been related to something else fixed in the latest release.

Reply