Hello,
we develop a scientific application using Python & C++ and heavily rely on MPI communication.
The current version works fine with IMPI 2018, but upgrading to OneAPI's IMPI 2021.03 we ran into issues on Windows.
We use a C++ extension module to Python, which exposes an initialization function which effectively calls MPI_Init(). This initialization function also registers a finalization function at program exit with std::atexit, which effectively calls MPI_Finalize().
Using IMPI 2018 this works correctly, but switching to IMPI 2021.03 we end up with a BAD TERMINATION exit status.
I attached the code of a minimal example that exhibits this behavior. The example can be used to build a very simple cPython extension module, which implements the module functions "initialize" and "testmpi". The "initialize" function calls MPI_Init, and registers a MPI_Finalize call
at program exit using std::atexit. The "testmpi" function needs MPI to be initialized and simply prints some MPI ranks in a Hello-World fashion. The README.md details the steps to reproduce the included logs (e.g test-impi2018.log).
I'd be glad for any hint how to mitigate this, and please let me know if some things need clarification (it's my first post after all :).
Thanks,
Benjamin
Our development team has investigated and identified the issue. You have registered a dependency on MPI_Finalize in atexit. This leads to a dependency on the libfabric DLL in atexit. During the exit process, the libfabric DLL is unloaded before this call is made, which leads to the error in this case. Per Microsoft's documentation for atexit, there should not be a dependency on any DLL in atexit (see https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/atexit?view=msvc-160). As such, this is an expected scenario, and you will need to update your code to call MPI_Finalize before atexit in order to avoid this error.
I am closing the associated Intel support case with this. Any further replies on this thread will be considered community-only.
链接已复制
Hi,
Thanks for reaching out to us.
We are able to reproduce your issue at our end. We are working on your issue and we will get back to you soon.
Thanks & Regards,
Santosh
I have replicated the issue as well, but I am having difficulties with how to set up one of our analysis tools to help identify the root cause. How would I modify what you have to correctly insert a library before impi.lib? I am trying to add VTmc.lib from Intel® Trace Analyzer and Collector, and when I recompile/relink with it, I get the following error at runtime:
import impiatexit
ImportError: DLL load failed while importing impiatexit: The specified module could not be found.
If you want to link other libraries with the extension module you can specify them in the setup-win-impi20201-atexit.py distutils script, I assume you did that already?
Regarding the runtime DLL load failure, maybe it's this: If you are using a python version >= 3.7, you need to specify where python is allowed to load DLLs from using os.add_dll_directory(), or place the DLLs next to the extension module. Python >=3.7 no longer searches %PATH% for DLLs.
I hope that helps, ask away if I can supply more info.
I apologize for the delayed response. I have escalated this to our development team for investigation and resolution.
Our development team has investigated and identified the issue. You have registered a dependency on MPI_Finalize in atexit. This leads to a dependency on the libfabric DLL in atexit. During the exit process, the libfabric DLL is unloaded before this call is made, which leads to the error in this case. Per Microsoft's documentation for atexit, there should not be a dependency on any DLL in atexit (see https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/atexit?view=msvc-160). As such, this is an expected scenario, and you will need to update your code to call MPI_Finalize before atexit in order to avoid this error.
I am closing the associated Intel support case with this. Any further replies on this thread will be considered community-only.
