(repost from incorrect category)
A simulation package I'm working on uses repeated communications between participants by connecting, sending data, then disconnecting. This is not something I can change.
When I run this on Intel MPI (I tried versions 2020 Update 1 and 2), this gives a segmentation fault after several connect/send/disconnect cycles. The exact number varies from several tens to a few thousand.
I have created a simple reproduction program (see attached).
This works fine if I use OpenMPI 4.0.2 compiled with GCC 8.3.0 and compile the program with the mpicc from OpenMPI, but segfaults with Intel compiler and MPI.
It would be nice if this can be corroborated. And if so, how to solve the problem.
If you're still getting this segfault, the developers have asked the following questions:
1) Which scale did you run ? "-n" and "-ppn".
2) We will be glad to get full log of running application with debug knobs:
I have an update on a fix for this. Starting in the IMPI 2021.3 release there is a knob for this issue:
Setting that should resolve the problem.
Sorry for not posting back earlier, I was not notified of any replies.
I will try the fix I_MPI_SPAWN_EXPERIMENTAL as soon as I have the 2021.3 release; currently we only have 2020 Update2 as most recent.
In the meantime, can you confirm that this is also fixed in the Intel oneAPI HPC pack?