Intel® HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2153 Discussions

IMPI 2021 vs 2018: BAD TERMINATION when using std::atexit in Python extension module

BenjaminAurich
Beginner
1,310 Views

Hello,

we develop a scientific application using Python & C++ and heavily rely on MPI communication.
The current version works fine with IMPI 2018, but upgrading to OneAPI's IMPI 2021.03 we ran into issues on Windows.

We use a C++ extension module to Python, which exposes an initialization function which effectively calls MPI_Init(). This initialization function also registers a finalization function at program exit with std::atexit, which effectively calls MPI_Finalize().

Using IMPI 2018 this works correctly, but switching to IMPI 2021.03 we end up with a BAD TERMINATION exit status.

I attached the code of a minimal example that exhibits this behavior. The example can be used to build a very simple cPython extension module, which implements the module functions "initialize" and "testmpi". The "initialize" function calls MPI_Init, and registers a MPI_Finalize call
at program exit using std::atexit. The "testmpi" function needs MPI to be initialized and simply prints some MPI ranks in a Hello-World fashion. The README.md details the steps to reproduce the included logs (e.g test-impi2018.log).

I'd be glad for any hint how to mitigate this, and please let me know if some things need clarification (it's my first post after all :).

Thanks,
Benjamin

Labels (1)
0 Kudos
1 Solution
James_T_Intel
Moderator
994 Views

Our development team has investigated and identified the issue. You have registered a dependency on MPI_Finalize in atexit. This leads to a dependency on the libfabric DLL in atexit. During the exit process, the libfabric DLL is unloaded before this call is made, which leads to the error in this case. Per Microsoft's documentation for atexit, there should not be a dependency on any DLL in atexit (see https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/atexit?view=msvc-160). As such, this is an expected scenario, and you will need to update your code to call MPI_Finalize before atexit in order to avoid this error.


I am closing the associated Intel support case with this. Any further replies on this thread will be considered community-only.


View solution in original post

0 Kudos
5 Replies
SantoshY_Intel
Moderator
1,279 Views

Hi,


Thanks for reaching out to us.


We are able to reproduce your issue at our end. We are working on your issue and we will get back to you soon.


Thanks & Regards,

Santosh


James_T_Intel
Moderator
1,215 Views

I have replicated the issue as well, but I am having difficulties with how to set up one of our analysis tools to help identify the root cause. How would I modify what you have to correctly insert a library before impi.lib? I am trying to add VTmc.lib from Intel® Trace Analyzer and Collector, and when I recompile/relink with it, I get the following error at runtime:


  import impiatexit

ImportError: DLL load failed while importing impiatexit: The specified module could not be found.


0 Kudos
BenjaminAurich
Beginner
1,177 Views

If you want to link other libraries with the extension module you can specify them in the setup-win-impi20201-atexit.py distutils script, I assume you did that already? 
Regarding the runtime DLL load failure, maybe it's this: If you are using a python version >= 3.7, you need to specify where python is allowed to load DLLs from using os.add_dll_directory(), or place the DLLs next to the extension module.  Python >=3.7 no longer searches %PATH% for DLLs.

I hope that helps, ask away if I can supply more info.

0 Kudos
James_T_Intel
Moderator
1,061 Views

I apologize for the delayed response. I have escalated this to our development team for investigation and resolution.


0 Kudos
James_T_Intel
Moderator
995 Views

Our development team has investigated and identified the issue. You have registered a dependency on MPI_Finalize in atexit. This leads to a dependency on the libfabric DLL in atexit. During the exit process, the libfabric DLL is unloaded before this call is made, which leads to the error in this case. Per Microsoft's documentation for atexit, there should not be a dependency on any DLL in atexit (see https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/atexit?view=msvc-160). As such, this is an expected scenario, and you will need to update your code to call MPI_Finalize before atexit in order to avoid this error.


I am closing the associated Intel support case with this. Any further replies on this thread will be considered community-only.


0 Kudos
Reply