- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
We have a product that uses intel mpi library in which I am experiencing an issue with MPI_Send and MPI_Irecv + MPI_Test communication.
Upon investigation, we found that a MPI_Send call in not getting honored by MPI_Test, i.e., the 'flag' argument in MPI_Test never gets set to 1(True). Due to this, our application is getting stuck in infinite do/while loop resulting in a hang.
To elaborate, we are using a controller-worker system having code snippets that look somewhat like this:
==================================================================
Worker side code:
bool WorkerSideFunc(MPINewJobInfo *mpiNewJobInfo)
{
MPI_Request requestHandler;
int gotJob = 0;
if(MPI_Irecv(mpiNewJobInfo,1,MPINewJobInfoType,0,NewSSQJobTAG,parentComm,&requestHandler) != MPI_SUCCESS)
{
return false;
}
do {
MPI_Test(&requestHandler, &gotJob, MPI_STATUS_IGNORE);
if (!gotJob)
{
if (WorkerCheckShutdownRequest())
{
if(!request || *request == MPI_REQUEST_NULL)
return;
MPI_Cancel(request);
MPI_Request_free(request);
*request = MPI_REQUEST_NULL;
return false;;
}
Sleep(50); // Non-blocking MPI Recv request has been called
}
else
return true;
} while (true);
} // END Of WorkerSideFunc
Controller side code:
bool ControllerSideFunc(MPINewJobInfo mpiNewJobInfo, int workerToSend)
{
if(MPI_Send(&mpiNewJobInfo,1,MPINewJobInfoType,0,NewSSQJobTAG,workerInterComm[workerToSend]) != MPI_SUCCESS)
{
return false;
}
return true;
} // End of ControllerSideFunc
==================================================================
The "MPI_Send" used at controller side is not getting honored by "MPI_Test" at worker side.
Please note that this behavior is only happening if we use mpi version 2021.15. On using mpi version 2021.6.0, we do not see the hang and the MPI_Send gets honored(gotJob gets set to 1).
Please also note the following :
1) I tried setting I_MPI_WAIT_MODE=1 and the issue got resolved.
2) I tried setting I_MPI_ASYNC_PROGRESS=1 and the issue got resolved.
This is a serious issue for our product and we would be extremely grateful for any possible resolution.
Thanks
Shivanshu
Link Copied

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page