Community
cancel
Showing results for 
Search instead for 
Did you mean: 
maartenb
New Contributor I
661 Views

Access violation using MPI_Dist_graph_create() in spawned processes.

Jump to solution

I have a python script that spawns two instances of an app using mpi4py (MPI.COMM_SELF.Spawn_multiple()). The app is coded in Fortran. Next, I'm setting up a graph for neighborhood communication between the two spawned processes. I'm getting an access violation in the Fortran child apps on the call to MPI_Dist_graph_create().

I'm linking the Fortran app against Intel MPI and use Intel distribution for Python on Windows 10. I also tried a standard Python distribution with mpi4py built manually against the Intel MPI library--same result.

Attached a minimal example. Error message included below. This example runs fine with MSMPI.

Note that I ran in a different problem spawning the Fortran apps from python, described in a different post. I solved this by creating a symbolic link to the appropriate directory in the Python installation directory.

Thanks,

Maarten

 

[proxy:1:0@T0147953] main (proxy.c:954): error launching_processes
[mpiexec@T0147953] Sending Ctrl-C to processes as requested
[mpiexec@T0147953] Press Ctrl-C again to force abort
[mpiexec@T0147953] HYD_sock_write (..\windows\src\hydra_sock.c:382): write error (errno = 2)
[mpiexec@T0147953] wmain (mpiexec.c:2096): assert (exitcodes != NULL) failed
forrtl: error (200): program aborting due to control-C event
Image              PC                Routine            Line        Source
impi.dll           00007FFCB6A691D8  Unknown               Unknown  Unknown
KERNELBASE.dll     00007FFD32A856FD  Unknown               Unknown  Unknown
KERNEL32.DLL       00007FFD36084034  Unknown               Unknown  Unknown
ntdll.dll          00007FFD363A3691  Unknown               Unknown  Unknown

(base) c:\intelpython3\symlink>python python_parent.py
forrtl: severe (157): Program Exception - access violation
Image              PC                Routine            Line        Source
impi.dll           00007FFCB6303A43  Unknown               Unknown  Unknown
impi.dll           00007FFCB62C8981  Unknown               Unknown  Unknown
impi.dll           00007FFCB6A194ED  Unknown               Unknown  Unknown
fortran_child.exe  00007FF6B03A1531  MAIN__                     27  fortran_child.f90
fortran_child.exe  00007FF6B03A16C2  Unknown               Unknown  Unknown
fortran_child.exe  00007FF6B03A4184  Unknown               Unknown  Unknown
fortran_child.exe  00007FF6B03A40AE  Unknown               Unknown  Unknown
fortran_child.exe  00007FF6B03A3F6E  Unknown               Unknown  Unknown
fortran_child.exe  00007FF6B03A41F9  Unknown               Unknown  Unknown
KERNEL32.DLL       00007FFD36084034  Unknown               Unknown  Unknown
ntdll.dll          00007FFD363A3691  Unknown               Unknown  Unknown

 

 

0 Kudos
1 Solution
James_T_Intel
Moderator
291 Views

I have received information from our development team that the internal issues are fixed for the next release, Intel® MPI Library 2021.2. Please watch for this release as part of the next update to Intel® oneAPI HPC Toolkit.


View solution in original post

10 Replies
PrasanthD_intel
Moderator
634 Views

Hi Maarten,

 

Could you please provide us logs after setting I_MPI_DEBUG=10

set I_MPI_DEBUG=10

Also, could you share the command you have used to create symbolic link?

 

Regards

Prasanth

 

maartenb
New Contributor I
623 Views

The debug output is given below. I omitted the Fortran stack trace since this has nothing new, as far as I can see.

I solved the problem with spawning the Fortran processes (solution posted in the thread of the post I referenced), so making a symbolic link is no longer necessary. I was a matter of passing a "path" parameter to MPI_Spawn_multiple(). The updated code of the python_shell is given below.

cheers,

Maarten

python_shell.py:

from mpi4py import MPI
import numpy as np, os

info = MPI.Info.Create()
info.Set('path', os.getcwd())
sub_comm = MPI.COMM_SELF.Spawn_multiple(['fortran_child.exe']*2,info=info)
common_comm=sub_comm.Merge(False)

topocomm = common_comm.Create_dist_graph([0],[0], np.array([],dtype=int), MPI.UNWEIGHTED)

common_comm.Disconnect()
sub_comm.Disconnect()
topocomm.Disconnect()

 

Debug output:

[0] MPI startup(): libfabric version: 1.7.1a1-impi

[0] MPI startup(): libfabric provider: sockets

[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       22224    T0147953   {0,1,2,3,4,5,6,7}
[0] MPI startup(): I_MPI_ROOT=C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.2.254\windows\mpi
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_DEBUG=10
[0] MPI startup(): libfabric version: 1.7.1a1-impi

[0] MPI startup(): libfabric provider: sockets

[0] MPI startup(): Rank    Pid      Node name  Pin cpu
[0] MPI startup(): 0       25772    T0147953   {0,1,2,3}
[0] MPI startup(): 1       26700    T0147953   {4,5,6,7}
[0] MPI startup(): I_MPI_ROOT=C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.2.254\windows\mpi
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_DEBUG=10

 

maartenb
New Contributor I
603 Views

Somebody incorrectly marked my post as the solution. Presumably because I wrote that I solved the problem of another post.

@PrasanthD_intel: please not this problem is *not* yet solved.

thanks,

Maarten

PrasanthD_intel
Moderator
567 Views

Hi Maarten,


We tried your code and it ran perfectly in Linux but giving "access violation error" when executed on windows, just like you have reported.

We don't know the exact reason so we escalating your query to the Subject Matter Experts.


Thanks

Prasanth


maartenb
New Contributor I
565 Views

Thanks for that. I hope it gets solved soon.

Best,

Maarten

James_T_Intel
Moderator
558 Views

I've reproduced this internally and I have provided this to our development team for analysis to fix the issue.


James_T_Intel
Moderator
423 Views

The error you are encountering is actually the result of multiple internal issues:


  • An incorrect interface is being selected on your system. This can happen for multiple reasons, including VPN. You can set FI_TCP_IFACE=eth0 to work around this issue.
  • An error in path for spawned images. We are working to resolve this, there is currently no workaround.
  • An error in MPI_Probe indexing. We are working to resolve this.

James_T_Intel
Moderator
292 Views

I have received information from our development team that the internal issues are fixed for the next release, Intel® MPI Library 2021.2. Please watch for this release as part of the next update to Intel® oneAPI HPC Toolkit.


View solution in original post

maartenb
New Contributor I
263 Views

Great! Hope the new  version will be released soon.

James_T_Intel
Moderator
291 Views

This issue has been resolved and we will no longer monitor this thread. If you require additional assistance from Intel, please start a new thread.Any further interaction in this thread will be considered community only.


Reply