Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.

Various issues with IMPI 2021.3

ThermoAnalytics
Beginner
698 Views

We are trying to upgrade our product from version 2018.1 to 2021.3 and have run into a couple issues.

1. On Windows, child processes are now being spawned by `hydra_pmi_proxy.exe` instead of `mpiexec.exe` which I understand from the documentation is intended. However, `hydra_pmi_proxy.exe` does not exit when all of its children exit; it sticks around and a subsequent run of the application results in a second one, and so on. If we skip MPI_Finalize(), we get a warning message from the library, but `hydra_pmi_proxy.exe` *does* quit as expected.

2. On Linux, it seems our application crashes in `MPI_Init` on machines with less than 2 GB available in /dev/shm. Is this the expected behavior? Is there a recommended way to avoid this?

3. Also on Linux, on a machine from roughly 2010 that works with the 2018 version of IMPI, we now get a crash with the message "Illegal instruction". Are there new hardware requirements for the 2021 version of IMPI, or is there some way we can handle this condition instead of crashing?

0 Kudos
8 Replies
ShivaniK_Intel
Moderator
662 Views

Hi,


Thanks for reaching out to us.


>>>On Windows, child processes are now being spawned by `hydra_pmi_proxy.exe` instead of `mpiexec.exe` which I understand from the documentation is intended. However, `hydra_pmi_proxy.exe` does not exist when all of its children exit; it sticks around and a subsequent run of the application results in a second one, and so on. If we skip MPI_Finalize(), we get a warning message from the library, but `hydra_pmi_proxy.exe` *does* quit as expected.


Thanks for posting. As this is a known issue, please refer to the below thread that addresses a similar issue you are facing. If you still face any issues please let us know.


https://community.intel.com/t5/Intel-oneAPI-HPC-Toolkit/InteloneAPI-MPI-2021-2-0-behavior-on-Linux-a...


>>>On Linux, it seems our application crashes in `MPI_Init` on machines with less than 2 GB available in /dev/shm. Is this the expected behavior? Is there a recommended way to avoid this?


Regarding this issue, we are working on it and will get back to you soon.


>>> Also on Linux, on a machine from roughly 2010 that works with the 2018 version of IMPI, we now get a crash with the message "Illegal instruction". Are there new hardware requirements for the 2021 version of IMPI, or is there some way we can handle this condition instead of crashing?


1.Could you please provide us the system environment details?


2.Could you also provide us the complete error log and sample reproducer code?


Meanwhile please refer to the below links for the hardware requirements of the Base toolkit and HPC toolkit of 2021.3 version.


Base toolkit:https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-base-toolkit-system-requi...


Hpc toolkit:https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-hpc-toolkit-system-requir...


Thanks & Regards

Shivani


ThermoAnalytics
Beginner
650 Views

I'm confused about the reply in the thread you linked.  Are you saying that it is intended both that hydra_pmi_proxy instances stick around forever if we call MPI_Finalize(), and that they exit cleanly when we don't call MPI_Finalize()?

ShivaniK_Intel
Moderator
614 Views

Hi,


>>>On Windows, child processes are now being spawned by `hydra_pmi_proxy.exe` instead of `mpiexec.exe` which I understand from the documentation is intended. However, `hydra_pmi_proxy.exe` does not exit when all of its children exit; it sticks around and a subsequent run of the application results in a second one, and so on. If we skip MPI_Finalize(), we get a warning message from the library, but `hydra_pmi_proxy.exe` *does* quit as expected.


Skipping MPI_Finalize() is not the recommended way to avoid this issue. As this is a known issue our team is working on it and it is likely to be fixed in future releases.


>>>On Linux, it seems our application crashes in `MPI_Init` on machines with less than 2 GB available in /dev/shm. Is this the expected behavior? Is there a recommended way to avoid this?



Regarding the limitation of /dev/shm for MPI, you can refer to the below documentation. The information is not specific to docker but it is general information.


https://software.intel.com/content/www/us/en/develop/documentation/mpi-developer-guide-linux/top/tro...


The limitation may also depend on the shared memory usage of the application. You can figure out the shared memory usage of your application by providing I_MPI_DEBUG=10 along with mpirun command. 


I_MPI_DEBUG=10 mpirun -n <no.of processes> ./a.out


>>> Also on Linux, on a machine from roughly 2010 that works with the 2018 version of IMPI, we now get a crash with the message "Illegal instruction". Are there new hardware requirements for the 2021 version of IMPI, or is there some way we can handle this condition instead of crashing?


1.Could you please provide us the system environment details?


2.Could you also provide us the complete error log and sample reproducer code?


Thanks & Regards

Shivani


ShivaniK_Intel
Moderator
568 Views

Hi,


As we didn't hear back from you, Is your issue resolved? If not, please provide the details that have been asked in my previous post.


Thanks & Regards

Shivani


ThermoAnalytics
Beginner
554 Views

We will skip this release for now, given the various issues.  Would it be possible for you to let us know when the hydra_pmi_proxy.exe issue is resolved?

ShivaniK_Intel
Moderator
522 Views

Hi,


Our engineering team is working on the fix. However, we don't have the visibility to comment anything on the fixed version or timeline.


But we can keep this thread open and we will update you once the issue is fixed.


Thanks & Regards

Shivani  


Frank_Illenseer
Beginner
109 Views

Hi Shivani,

 

are there any news on this issue and possible fixes?

 

Thanks and best regards,

Frank

SantoshY_Intel
Moderator
79 Views

Hi,


Thanks for your feedback. We have provided your feedback to the relevant team. At this moment there is no visibility when it will be implemented and available for use. Please let me know if we can go ahead and close this case.



Thanks & Regards,

Santosh



Reply