- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
I have a python script that spawns two instances of an app using mpi4py (MPI.COMM_SELF.Spawn_multiple()). The app is coded in Fortran. Next, I'm setting up a graph for neighborhood communication between the two spawned processes. I'm getting an access violation in the Fortran child apps on the call to MPI_Dist_graph_create().
I'm linking the Fortran app against Intel MPI and use Intel distribution for Python on Windows 10. I also tried a standard Python distribution with mpi4py built manually against the Intel MPI library--same result.
Attached a minimal example. Error message included below. This example runs fine with MSMPI.
Note that I ran in a different problem spawning the Fortran apps from python, described in a different post. I solved this by creating a symbolic link to the appropriate directory in the Python installation directory.
Thanks,
Maarten
[proxy:1:0@T0147953] main (proxy.c:954): error launching_processes
[mpiexec@T0147953] Sending Ctrl-C to processes as requested
[mpiexec@T0147953] Press Ctrl-C again to force abort
[mpiexec@T0147953] HYD_sock_write (..\windows\src\hydra_sock.c:382): write error (errno = 2)
[mpiexec@T0147953] wmain (mpiexec.c:2096): assert (exitcodes != NULL) failed
forrtl: error (200): program aborting due to control-C event
Image PC Routine Line Source
impi.dll 00007FFCB6A691D8 Unknown Unknown Unknown
KERNELBASE.dll 00007FFD32A856FD Unknown Unknown Unknown
KERNEL32.DLL 00007FFD36084034 Unknown Unknown Unknown
ntdll.dll 00007FFD363A3691 Unknown Unknown Unknown
(base) c:\intelpython3\symlink>python python_parent.py
forrtl: severe (157): Program Exception - access violation
Image PC Routine Line Source
impi.dll 00007FFCB6303A43 Unknown Unknown Unknown
impi.dll 00007FFCB62C8981 Unknown Unknown Unknown
impi.dll 00007FFCB6A194ED Unknown Unknown Unknown
fortran_child.exe 00007FF6B03A1531 MAIN__ 27 fortran_child.f90
fortran_child.exe 00007FF6B03A16C2 Unknown Unknown Unknown
fortran_child.exe 00007FF6B03A4184 Unknown Unknown Unknown
fortran_child.exe 00007FF6B03A40AE Unknown Unknown Unknown
fortran_child.exe 00007FF6B03A3F6E Unknown Unknown Unknown
fortran_child.exe 00007FF6B03A41F9 Unknown Unknown Unknown
KERNEL32.DLL 00007FFD36084034 Unknown Unknown Unknown
ntdll.dll 00007FFD363A3691 Unknown Unknown Unknown
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
I have received information from our development team that the internal issues are fixed for the next release, Intel® MPI Library 2021.2. Please watch for this release as part of the next update to Intel® oneAPI HPC Toolkit.
コピーされたリンク
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Hi Maarten,
Could you please provide us logs after setting I_MPI_DEBUG=10
set I_MPI_DEBUG=10
Also, could you share the command you have used to create symbolic link?
Regards
Prasanth
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
The debug output is given below. I omitted the Fortran stack trace since this has nothing new, as far as I can see.
I solved the problem with spawning the Fortran processes (solution posted in the thread of the post I referenced), so making a symbolic link is no longer necessary. I was a matter of passing a "path" parameter to MPI_Spawn_multiple(). The updated code of the python_shell is given below.
cheers,
Maarten
python_shell.py:
from mpi4py import MPI
import numpy as np, os
info = MPI.Info.Create()
info.Set('path', os.getcwd())
sub_comm = MPI.COMM_SELF.Spawn_multiple(['fortran_child.exe']*2,info=info)
common_comm=sub_comm.Merge(False)
topocomm = common_comm.Create_dist_graph([0],[0], np.array([],dtype=int), MPI.UNWEIGHTED)
common_comm.Disconnect()
sub_comm.Disconnect()
topocomm.Disconnect()
Debug output:
[0] MPI startup(): libfabric version: 1.7.1a1-impi
[0] MPI startup(): libfabric provider: sockets
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 22224 T0147953 {0,1,2,3,4,5,6,7}
[0] MPI startup(): I_MPI_ROOT=C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.2.254\windows\mpi
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_DEBUG=10
[0] MPI startup(): libfabric version: 1.7.1a1-impi
[0] MPI startup(): libfabric provider: sockets
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 25772 T0147953 {0,1,2,3}
[0] MPI startup(): 1 26700 T0147953 {4,5,6,7}
[0] MPI startup(): I_MPI_ROOT=C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.2.254\windows\mpi
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_DEBUG=10
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Somebody incorrectly marked my post as the solution. Presumably because I wrote that I solved the problem of another post.
@PrasanthD_intel: please not this problem is *not* yet solved.
thanks,
Maarten
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Hi Maarten,
We tried your code and it ran perfectly in Linux but giving "access violation error" when executed on windows, just like you have reported.
We don't know the exact reason so we escalating your query to the Subject Matter Experts.
Thanks
Prasanth
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Thanks for that. I hope it gets solved soon.
Best,
Maarten
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
I've reproduced this internally and I have provided this to our development team for analysis to fix the issue.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
The error you are encountering is actually the result of multiple internal issues:
- An incorrect interface is being selected on your system. This can happen for multiple reasons, including VPN. You can set FI_TCP_IFACE=eth0 to work around this issue.
- An error in path for spawned images. We are working to resolve this, there is currently no workaround.
- An error in MPI_Probe indexing. We are working to resolve this.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
I have received information from our development team that the internal issues are fixed for the next release, Intel® MPI Library 2021.2. Please watch for this release as part of the next update to Intel® oneAPI HPC Toolkit.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
This issue has been resolved and we will no longer monitor this thread. If you require additional assistance from Intel, please start a new thread.Any further interaction in this thread will be considered community only.
