- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
(repost from incorrect category)
A simulation package I'm working on uses repeated communications between participants by connecting, sending data, then disconnecting. This is not something I can change.
When I run this on Intel MPI (I tried versions 2020 Update 1 and 2), this gives a segmentation fault after several connect/send/disconnect cycles. The exact number varies from several tens to a few thousand.
I have created a simple reproduction program (see attached).
This works fine if I use OpenMPI 4.0.2 compiled with GCC 8.3.0 and compile the program with the mpicc from OpenMPI, but segfaults with Intel compiler and MPI.
It would be nice if this can be corroborated. And if so, how to solve the problem.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
We are able to reproduce the issue at our end. We are working on it and will get back to you soon.
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please provide the OFI provider you have been using?
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
As we didn't hear back from you, Is your issue resolved? If not, please provide the details that have been asked in my previous post.
Thanks & Regards
Shivani
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you're still getting this segfault, the developers have asked the following questions:
1) Which scale did you run ? "-n" and "-ppn".
2) We will be glad to get full log of running application with debug knobs:
export I_MPI_DEBUG=1000
export FI_LOG_LEVEL=debug
export I_MPI_HYDRA_DEBUG=1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have an update on a fix for this. Starting in the IMPI 2021.3 release there is a knob for this issue:
I_MPI_SPAWN_EXPERIMENTAL=1
Setting that should resolve the problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for not posting back earlier, I was not notified of any replies.
I will try the fix I_MPI_SPAWN_EXPERIMENTAL as soon as I have the 2021.3 release; currently we only have 2020 Update2 as most recent.
In the meantime, can you confirm that this is also fixed in the Intel oneAPI HPC pack?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, this fix is included in the oneAPI HPC toolkit.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can confirm that simulations now run with the proposed environment variable set.
A segfault occurs at the end of the simulations though, see here
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page