Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
1912 Discussions

SEGFAULT on repeated MPI_Comm_connect|MPI_Comm_accept/MPI_Comm_disconnect

menno
Beginner
1,159 Views

(repost from incorrect category)

A simulation package I'm working on uses repeated communications between participants by connecting, sending data, then disconnecting. This is not something I can change.

 

When I run this on Intel MPI (I tried versions 2020 Update 1 and 2), this gives a segmentation fault after several connect/send/disconnect cycles. The exact number varies from several tens to a few thousand.

 

I have created a simple reproduction program (see attached).

 

This works fine if I use OpenMPI 4.0.2 compiled with GCC 8.3.0 and compile the program with the mpicc from OpenMPI, but segfaults with Intel compiler and MPI.

 

It would be nice if this can be corroborated. And if so, how to solve the problem.

0 Kudos
8 Replies
ShivaniK_Intel
Moderator
1,097 Views

Hi,


Thanks for reaching out to us.


We are able to reproduce the issue at our end. We are working on it and will get back to you soon.


Thanks & Regards

Shivani


ShivaniK_Intel
Moderator
1,038 Views

Hi,


Could you please provide the OFI provider you have been using?


Thanks & Regards

Shivani


ShivaniK_Intel
Moderator
1,023 Views

Hi,


As we didn't hear back from you, Is your issue resolved? If not, please provide the details that have been asked in my previous post.


Thanks & Regards

Shivani


Jennifer_D_Intel
Employee
873 Views

If you're still getting this segfault, the developers have asked the following questions:


1) Which scale did you run ? "-n" and "-ppn".

2) We will be glad to get full log of running application with debug knobs:

export I_MPI_DEBUG=1000

export FI_LOG_LEVEL=debug

export I_MPI_HYDRA_DEBUG=1


Jennifer_D_Intel
Employee
798 Views

I have an update on a fix for this. Starting in the IMPI 2021.3 release there is a knob for this issue:

I_MPI_SPAWN_EXPERIMENTAL=1


Setting that should resolve the problem.  


menno
Beginner
626 Views

Sorry for not posting back earlier, I was not notified of any replies.

 

I will try the fix I_MPI_SPAWN_EXPERIMENTAL as soon as I have the 2021.3 release; currently we only have 2020 Update2 as most recent.

 

In the meantime, can you confirm that this is also fixed in the Intel oneAPI HPC pack?

Jennifer_D_Intel
Employee
576 Views

Yes, this fix is included in the oneAPI HPC toolkit.


menno
Beginner
355 Views

I can confirm that simulations now run with the proposed environment variable set.

A segfault occurs at the end of the simulations though, see here 

Reply