- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Launching this on Fedora 40
mpiexec -machinefile mfile -configure cfile someprogram
I encountered sporadic error like this
Abort(1614735) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Unknown error class, error stack:
MPIR_Init_thread(192)........:
MPID_Init(1665)..............:
MPIDI_OFI_mpi_init_hook(1665):
create_vni_context(2245).....: OFI EP enable failed (ofi_init.c:2245:create_vni_context:Address already in use)
This does not happen every time. If it happens and then I relaunch it and then it can run fine.
Is there anyway to get rid of this problem permanently?
What is this "Address already in use" error?
I have already search through the discussions and none of them seem to apply directly.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@YaDev
Please provide at least your HW and SW environment and the output of I_MPI_DEBUG=10.
If you can reproduce the failure, please add I_MPI_HYDRA_DEBUG=1 and I_MPI_DEBUG=120
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
By setting those flags, there are a lot of debug output which also seemed to have stopped the
original OFI EP enable failed (ofi_init.c:2245:create_vni_context:Address already in use) error.
I was running MPI many many times in succession one after another using a script. Maybe the "vni context address" was not released fast enough between runs but having all these debug output slowed things down enough for it to be released before the next mpiexec ... call?
Is that possible?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page