Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2161 Discussions

Bug in Intel MPI when combining MPI_Intercomm_create and an async progress thread

Donners__John
Beginner
821 Views

Hello,

 

The Intel MPI library (version 2021.9.0) fails when creating an intercommunicator with an asynchronous progress thread. I included a test program which shows the following error:

 

$ I_MPI_ASYNC_PROGRESS=1 mpirun -n 10 ./a.out
Abort(204053775) on node 4 (rank 4 in comm 0): Fatal error in PMPI_Intercomm_create: Other MPI error, error stack:
PMPI_Intercomm_create(317)...........: MPI_Intercomm_create(comm=0x84000002, local_leader=0, MPI_COMM_WORLD, remote_leader=0, tag=1, newintercomm=0x7fff23cc61e4) failed
MPIR_Intercomm_create_impl(49).......:
MPID_Intercomm_exchange_map(645).....:
MPIDIU_Intercomm_map_bcast_intra(112):
MPIR_Bcast_intra_auto(85)............:
MPIR_Bcast_intra_binomial(131).......: message sizes do not match across processes in the collective routine: Received 4100 but expected 16

 The program runs fine without the asynchronous progress thread.

Note that this does not happen every time and the probability increases with higher number of MPI ranks. Also, the numbers for 'received' and 'expected' both change, so it looks like a race condition.

0 Kudos
3 Replies
AishwaryaCV_Intel
Moderator
806 Views

Hi,

 

Thank you for posting in intel communities. 

 

We are able to reproduce the issue, we are working on it and will get back to you soon.

 

Thanks And Regards,

Aishwarya

 

 

0 Kudos
AishwaryaCV_Intel
Moderator
698 Views

Hi,


We have informed the development team about the issue , we will inform you once there is any update on the issue.


Thanks And Regards,

Aishwarya


0 Kudos
VeenaJ_Intel
Moderator
488 Views

Hi,

 

Thank you for your patience. I wanted to inform you that a fix for the issue you've encountered will be in 2021.12 version

 

Regards,

Veena

 

0 Kudos
Reply