Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Various issues with IMPI 2021.3

ThermoAnalytics
Beginner
2,599 Views

We are trying to upgrade our product from version 2018.1 to 2021.3 and have run into a couple issues.

1. On Windows, child processes are now being spawned by `hydra_pmi_proxy.exe` instead of `mpiexec.exe` which I understand from the documentation is intended. However, `hydra_pmi_proxy.exe` does not exit when all of its children exit; it sticks around and a subsequent run of the application results in a second one, and so on. If we skip MPI_Finalize(), we get a warning message from the library, but `hydra_pmi_proxy.exe` *does* quit as expected.

2. On Linux, it seems our application crashes in `MPI_Init` on machines with less than 2 GB available in /dev/shm. Is this the expected behavior? Is there a recommended way to avoid this?

3. Also on Linux, on a machine from roughly 2010 that works with the 2018 version of IMPI, we now get a crash with the message "Illegal instruction". Are there new hardware requirements for the 2021 version of IMPI, or is there some way we can handle this condition instead of crashing?

0 Kudos
8 Replies
ShivaniK_Intel
Moderator
2,563 Views

Hi,


Thanks for reaching out to us.


>>>On Windows, child processes are now being spawned by `hydra_pmi_proxy.exe` instead of `mpiexec.exe` which I understand from the documentation is intended. However, `hydra_pmi_proxy.exe` does not exist when all of its children exit; it sticks around and a subsequent run of the application results in a second one, and so on. If we skip MPI_Finalize(), we get a warning message from the library, but `hydra_pmi_proxy.exe` *does* quit as expected.


Thanks for posting. As this is a known issue, please refer to the below thread that addresses a similar issue you are facing. If you still face any issues please let us know.


https://community.intel.com/t5/Intel-oneAPI-HPC-Toolkit/InteloneAPI-MPI-2021-2-0-behavior-on-Linux-and-Windows-differ/m-p/1294825#M8517


>>>On Linux, it seems our application crashes in `MPI_Init` on machines with less than 2 GB available in /dev/shm. Is this the expected behavior? Is there a recommended way to avoid this?


Regarding this issue, we are working on it and will get back to you soon.


>>> Also on Linux, on a machine from roughly 2010 that works with the 2018 version of IMPI, we now get a crash with the message "Illegal instruction". Are there new hardware requirements for the 2021 version of IMPI, or is there some way we can handle this condition instead of crashing?


1.Could you please provide us the system environment details?


2.Could you also provide us the complete error log and sample reproducer code?


Meanwhile please refer to the below links for the hardware requirements of the Base toolkit and HPC toolkit of 2021.3 version.


Base toolkit:https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-base-toolkit-system-requirements.html


Hpc toolkit:https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-hpc-toolkit-system-requirements.html


Thanks & Regards

Shivani


0 Kudos
ThermoAnalytics
Beginner
2,551 Views

I'm confused about the reply in the thread you linked.  Are you saying that it is intended both that hydra_pmi_proxy instances stick around forever if we call MPI_Finalize(), and that they exit cleanly when we don't call MPI_Finalize()?

0 Kudos
ShivaniK_Intel
Moderator
2,515 Views

Hi,


>>>On Windows, child processes are now being spawned by `hydra_pmi_proxy.exe` instead of `mpiexec.exe` which I understand from the documentation is intended. However, `hydra_pmi_proxy.exe` does not exit when all of its children exit; it sticks around and a subsequent run of the application results in a second one, and so on. If we skip MPI_Finalize(), we get a warning message from the library, but `hydra_pmi_proxy.exe` *does* quit as expected.


Skipping MPI_Finalize() is not the recommended way to avoid this issue. As this is a known issue our team is working on it and it is likely to be fixed in future releases.


>>>On Linux, it seems our application crashes in `MPI_Init` on machines with less than 2 GB available in /dev/shm. Is this the expected behavior? Is there a recommended way to avoid this?



Regarding the limitation of /dev/shm for MPI, you can refer to the below documentation. The information is not specific to docker but it is general information.


https://software.intel.com/content/www/us/en/develop/documentation/mpi-developer-guide-linux/top/troubleshooting/problem-mpi-limitation-for-docker.html


The limitation may also depend on the shared memory usage of the application. You can figure out the shared memory usage of your application by providing I_MPI_DEBUG=10 along with mpirun command. 


I_MPI_DEBUG=10 mpirun -n <no.of processes> ./a.out


>>> Also on Linux, on a machine from roughly 2010 that works with the 2018 version of IMPI, we now get a crash with the message "Illegal instruction". Are there new hardware requirements for the 2021 version of IMPI, or is there some way we can handle this condition instead of crashing?


1.Could you please provide us the system environment details?


2.Could you also provide us the complete error log and sample reproducer code?


Thanks & Regards

Shivani


0 Kudos
ShivaniK_Intel
Moderator
2,469 Views

Hi,


As we didn't hear back from you, Is your issue resolved? If not, please provide the details that have been asked in my previous post.


Thanks & Regards

Shivani


0 Kudos
ThermoAnalytics
Beginner
2,455 Views

We will skip this release for now, given the various issues.  Would it be possible for you to let us know when the hydra_pmi_proxy.exe issue is resolved?

0 Kudos
ShivaniK_Intel
Moderator
2,423 Views

Hi,


Our engineering team is working on the fix. However, we don't have the visibility to comment anything on the fixed version or timeline.


But we can keep this thread open and we will update you once the issue is fixed.


Thanks & Regards

Shivani  


0 Kudos
Frank_Illenseer
Beginner
2,010 Views

Hi Shivani,

 

are there any news on this issue and possible fixes?

 

Thanks and best regards,

Frank

0 Kudos
SantoshY_Intel
Moderator
1,980 Views

Hi,


Thanks for your feedback. We have provided your feedback to the relevant team. At this moment there is no visibility when it will be implemented and available for use. Please let me know if we can go ahead and close this case.



Thanks & Regards,

Santosh



0 Kudos
Reply