Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2275 討論

Bug in Intel MPI when combining MPI_Intercomm_create and an async progress thread

Donners__John
初學者
1,533 檢視

Hello,

 

The Intel MPI library (version 2021.9.0) fails when creating an intercommunicator with an asynchronous progress thread. I included a test program which shows the following error:

 

$ I_MPI_ASYNC_PROGRESS=1 mpirun -n 10 ./a.out
Abort(204053775) on node 4 (rank 4 in comm 0): Fatal error in PMPI_Intercomm_create: Other MPI error, error stack:
PMPI_Intercomm_create(317)...........: MPI_Intercomm_create(comm=0x84000002, local_leader=0, MPI_COMM_WORLD, remote_leader=0, tag=1, newintercomm=0x7fff23cc61e4) failed
MPIR_Intercomm_create_impl(49).......:
MPID_Intercomm_exchange_map(645).....:
MPIDIU_Intercomm_map_bcast_intra(112):
MPIR_Bcast_intra_auto(85)............:
MPIR_Bcast_intra_binomial(131).......: message sizes do not match across processes in the collective routine: Received 4100 but expected 16

 The program runs fine without the asynchronous progress thread.

Note that this does not happen every time and the probability increases with higher number of MPI ranks. Also, the numbers for 'received' and 'expected' both change, so it looks like a race condition.

0 積分
3 回應
AishwaryaCV_Intel
1,518 檢視

Hi,

 

Thank you for posting in intel communities. 

 

We are able to reproduce the issue, we are working on it and will get back to you soon.

 

Thanks And Regards,

Aishwarya

 

 

AishwaryaCV_Intel
1,410 檢視

Hi,


We have informed the development team about the issue , we will inform you once there is any update on the issue.


Thanks And Regards,

Aishwarya


VeenaJ_Intel
主席
1,200 檢視

Hi,

 

Thank you for your patience. I wanted to inform you that a fix for the issue you've encountered will be in 2021.12 version

 

Regards,

Veena

 

回覆