Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2260 Discussions

MPI_Send not getting honored by MPI_Test (flag argument never gets set to 1/true)

shivc
Beginner
175 Views

Hello,

We have a product that uses intel mpi library in which I am experiencing an issue with MPI_Send and MPI_Irecv + MPI_Test communication. 
Upon investigation, we found that a MPI_Send call in not getting honored by MPI_Test, i.e., the 'flag' argument in MPI_Test never gets set to 1(True). Due to this, our application is getting stuck in infinite do/while loop resulting in a hang.

To elaborate, we are using a controller-worker system having code snippets that look somewhat like this:
==================================================================

Worker side code:

bool WorkerSideFunc(MPINewJobInfo *mpiNewJobInfo)
{
 MPI_Request requestHandler;
 int gotJob = 0;

 if(MPI_Irecv(mpiNewJobInfo,1,MPINewJobInfoType,0,NewSSQJobTAG,parentComm,&requestHandler) != MPI_SUCCESS)
  {
         return false;
  }


        do {
                 MPI_Test(&requestHandler, &gotJob, MPI_STATUS_IGNORE);
                 if (!gotJob)
                 {
                       if (WorkerCheckShutdownRequest())
                       {
                             
                           if(!request || *request == MPI_REQUEST_NULL)
                                          return;

                           MPI_Cancel(request);
                           
                           MPI_Request_free(request);
                           
                           *request = MPI_REQUEST_NULL;
                             
                            return false;;
                       }
                       Sleep(50); // Non-blocking MPI Recv request has been called 
                 }
                 else
                        return true;
       } while (true);
} // END Of WorkerSideFunc

Controller side code:
bool ControllerSideFunc(MPINewJobInfo mpiNewJobInfo, int workerToSend)
{
             if(MPI_Send(&mpiNewJobInfo,1,MPINewJobInfoType,0,NewSSQJobTAG,workerInterComm[workerToSend]) != MPI_SUCCESS)
   {
             return false;
   }
return true;
} // End of ControllerSideFunc
==================================================================

The "MPI_Send" used at controller side is not getting honored by "MPI_Test" at worker side.

Please note that this behavior is only happening if we use mpi version 2021.15. On using mpi version 2021.6.0, we do not see the hang and the MPI_Send gets honored(gotJob gets set to 1). 

Please also note the following :
1) I tried setting I_MPI_WAIT_MODE=1 and the issue got resolved.
2) I tried setting I_MPI_ASYNC_PROGRESS=1 and the issue got resolved.

This is a serious issue for our product and we would be extremely grateful for any possible resolution.

Thanks
Shivanshu

0 Kudos
0 Replies
Reply