Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2229 Discussions

SEGFAULT on repeated MPI_Comm_connect|MPI_Comm_accept/MPI_Comm_disconnect

menno
Beginner
3,313 Views

(repost from incorrect category)

A simulation package I'm working on uses repeated communications between participants by connecting, sending data, then disconnecting. This is not something I can change.

 

When I run this on Intel MPI (I tried versions 2020 Update 1 and 2), this gives a segmentation fault after several connect/send/disconnect cycles. The exact number varies from several tens to a few thousand.

 

I have created a simple reproduction program (see attached).

 

This works fine if I use OpenMPI 4.0.2 compiled with GCC 8.3.0 and compile the program with the mpicc from OpenMPI, but segfaults with Intel compiler and MPI.

 

It would be nice if this can be corroborated. And if so, how to solve the problem.

0 Kudos
8 Replies
ShivaniK_Intel
Moderator
3,251 Views

Hi,


Thanks for reaching out to us.


We are able to reproduce the issue at our end. We are working on it and will get back to you soon.


Thanks & Regards

Shivani


ShivaniK_Intel
Moderator
3,192 Views

Hi,


Could you please provide the OFI provider you have been using?


Thanks & Regards

Shivani


0 Kudos
ShivaniK_Intel
Moderator
3,177 Views

Hi,


As we didn't hear back from you, Is your issue resolved? If not, please provide the details that have been asked in my previous post.


Thanks & Regards

Shivani


0 Kudos
Jennifer_D_Intel
Moderator
3,027 Views

If you're still getting this segfault, the developers have asked the following questions:


1) Which scale did you run ? "-n" and "-ppn".

2) We will be glad to get full log of running application with debug knobs:

export I_MPI_DEBUG=1000

export FI_LOG_LEVEL=debug

export I_MPI_HYDRA_DEBUG=1


0 Kudos
Jennifer_D_Intel
Moderator
2,951 Views

I have an update on a fix for this. Starting in the IMPI 2021.3 release there is a knob for this issue:

I_MPI_SPAWN_EXPERIMENTAL=1


Setting that should resolve the problem.  


0 Kudos
menno
Beginner
2,780 Views

Sorry for not posting back earlier, I was not notified of any replies.

 

I will try the fix I_MPI_SPAWN_EXPERIMENTAL as soon as I have the 2021.3 release; currently we only have 2020 Update2 as most recent.

 

In the meantime, can you confirm that this is also fixed in the Intel oneAPI HPC pack?

0 Kudos
Jennifer_D_Intel
Moderator
2,730 Views

Yes, this fix is included in the oneAPI HPC toolkit.


0 Kudos
menno
Beginner
2,508 Views

I can confirm that simulations now run with the proposed environment variable set.

A segfault occurs at the end of the simulations though, see here 

0 Kudos
Reply