Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
1895 Discussions

MPI applications which embed Python do not work on AMD systems when using Intel MPI >=2019 runtime

ESI_Group
Partner
582 Views

In short, calling Py_Initialize() within an MPI program while using the Intel MPI 2019 or later runtime on an AMD machine results in the error "Python error: <stdin> is a directory, cannot continue". This does not occur when using Intel MPI 2018.3 on AMD, or when using an Intel machine with IMPI 2019.9. It does not matter where the executables were build (AMD or Intel), the behavior is the same at runtime.

The attached proof of concept contains a C source file with a simple MPI initialization, hello world, Python initialization, and printing of the system time from Python. It also contains a sequential version.

"bash build_poc.gsm" will build a sequential and MPI version of the POC.

"bash run_poc.gsm" will run the sequential version, the MPI version with the 2018.3 runtime, and the MPI version again with the 2019.9 runtime.

To run these scripts, please update your environment to point to your local compiler, MPI runtime, and Python installation.  You may also need to update the compilation/linking flags based on the output of your local "python3-config --cflags" and "python3-config --ldflags".

We used an EPYC machine with Linux kernel version 2.6.32-754.el6.x86_64 and a Skylake-X machine with Linux kernel version 3.10.0-1062.el7.x86_64.  Our colleagues have been able to reproduce this issue on AMD machines with EL7, other Linux distributions entirely, and newer EPYC chips than our test machine, so this does not seem specific to one version of Linux, or a specific generation of hardware.

Here is the output from the Intel machine:

[gsm@bruser018 POC]$ bash run_poc.gsm
Running sequential
Today is Tue May 11 23:03:18 2021
Running MPI 2018.3
Hello world from processor bruser018.esi-internal.esi-group.com, rank 0 out of 2 processors
Hello world from processor bruser018.esi-internal.esi-group.com, rank 1 out of 2 processors
Today is Tue May 11 23:03:19 2021
Today is Tue May 11 23:03:19 2021
Running MPI 2019.9
Hello world from processor bruser018.esi-internal.esi-group.com, rank 0 out of 2 processors
Hello world from processor bruser018.esi-internal.esi-group.com, rank 1 out of 2 processors
Today is Tue May 11 23:03:20 2021
Today is Tue May 11 23:03:20 2021

Here is the output from the AMD machine:

[gsm@bruser033 POC]$ bash run_poc.gsm
Running sequential
Today is Tue May 11 23:03:06 2021
Running MPI 2018.3
Hello world from processor bruser033.esi-internal.esi-group.com, rank 0 out of 2 processors
Hello world from processor bruser033.esi-internal.esi-group.com, rank 1 out of 2 processors
Today is Tue May 11 23:03:07 2021
Today is Tue May 11 23:03:07 2021
Running MPI 2019.9
Hello world from processor bruser033.esi-internal.esi-group.com, rank 1 out of 2 processors
Hello world from processor bruser033.esi-internal.esi-group.com, rank 0 out of 2 processors
Python error: <stdin> is a directory, cannot continue

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 63349 RUNNING AT bruser033.esi-internal.esi-group.com
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

As you can see, all the tests succeed on Intel. The sequential and IMPI 2018.3 tests succeed on AMD, but the 2019.9 test fails.

We would greatly appreciate knowledge of any potential workarounds for this issue.  Thanks!

Labels (1)
0 Kudos
6 Replies
ShivaniK_Intel
Moderator
557 Views

Hi,


Thanks for providing the source code. We have tested the code on the Intel platform it worked fine. We are working on the issue and will get back to you soon.


Thanks & Regards

Shivani


ESI_Group
Partner
502 Views

We were able to find a workaround: redirecting stdin to all procs by passing "-s all" to mpirun.

Mark_L_Intel
Employee
484 Views

Thank you for your detailed report. It is great that you found a workaround. The engineering team is working on how to address this issue in more systematic way.


Mark_L_Intel
Employee
468 Views

Our engineering team confirmed that this is a known issue. Unfortunately, the fix is missing in 2021.2 and all 2019 versions but 2021.3 should include fix.


The current workaround is still ‘-s all’. It is not related to non-Intel cpus, but to gcc. it is assumed that you used icc on IA.


Our engineering team saw such behavior on IA with gfortran. The issue is that mpiexec closes fd 0 by default for all ranks but rank 0 -- and gcc runtime does not always correctly handle this.


‘-s all’ leads to fd 0 remaining open, even we do not need it. IMPI2021.3 will have a fix - it won't close fd 0 by default.


Mark_L_Intel
Employee
426 Views

Hello,


Did we answer your question? Anything else can we help you with?


Mark_L_Intel
Employee
408 Views

Hello,


This question will no longer be handled by Intel support due to inactivity,


Reply