Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

hydra_bstrap_pmi crash

msafdari
Novice
795 Views

I have two programs A and B. Both MPI-enabled. A is started with mpiexec.exe and B is called through a system call by A. The system call simply runs "mpiexec.exe B args ..." . Once system call is issued, hydra_bstrap_pmi (called through mpiexec) starts and quickly crashes. The condition only happens in Windows and if A is called with mpiexec. Unfortunately bootstrap code does not output any debug message regardless of setting proper envs. Is there any way to output some error information so that I can debug the issue. Also, I am using Intel MPI 2021.3 library.

Labels (1)
0 Kudos
8 Replies
SantoshY_Intel
Moderator
780 Views

Hi,

 

Thanks for reaching out to us.

 

Could you please provide us with a sample reproducer code for programs A and B to investigate your issue from our end?

 

>>"Is there any way to output some error information so that I can debug the issue"

You can use the below command to get the debug information:

mpiexec -v -genv I_MPI_DEBUG=5 -genv FI_LOG_LEVEL=debug sample.exe

 

Thanks & Regards,

Santosh

 

msafdari
Novice
769 Views

Thanks Santosh for providing this response.

I need to see if I can reproduce this with some minimal code, but it will be hard. Regarding the ENV you suggested, still no message/output is generated by mpiexec, hydra_bstrap_proxy (crashing binary), and hydra_pmi_proxy. Is there any other ENV to set? Here are those set currently (all will -genv):

I_MPI_DEBUG=10

I_MPI_DEBUG_OUTPUT=stdout

I_MPI_HYDRA_DEBUG=on

plus I am passing "-v  -genv FI_LOG_LEVEL=debug" to mpiexec.exe during the launch.

SantoshY_Intel
Moderator
749 Views

Hi,

 

We can see from your previous response, that you have used all the available options to get the debug information while launching an MPI job using the mpiexec compiler.

 

And, there are no other options (or) environment variables for getting the debug/error information.

 

It will be helpful for us if you could provide a sample reproducer code for further investigation from our end.

 

Thanks & regards,

Santosh

 

 

 

msafdari
Novice
741 Views

Thanks Santosh for confirming the ENVs.

I was wondering if multiple level MPI of this type (i.e. master rank of program A starts program B through a system call using mpiexec.exe args....) is guaranteed to work with Intel MPI 2021 on Windows or it is considered non-standard? 

 

Thanks

SantoshY_Intel
Moderator
687 Views

hi,

 

We tried to implement your scenario where a master rank of program A starts program B through a system call using "mpiexec.exe args" using the below-attached sample codes(a.c & b.c).

 

Please let us know if our sample code doesn't reflect your use case.

 

Below are the steps that we followed using the latest Intel MPI Library 2021.5 on a windows machine:

  1. mpiicc a.c 
  2. mpiicc b.c
  3. mpiexec -n 4 a.exe

We have encountered an error as shown in the below screenshot:

err_syscall.png

 

We had a similar issue which has been reported to the concerned development team. This issue will be fixed in Intel oneAPI future releases.

 

however, the same scenario works fine on a Linux machine as shown in the screenshot below:

worked_syscall.png

 

>>" is guaranteed to work with Intel MPI 2021 on Windows or it is considered non-standard? "

Once the issue is fixed in future releases, then you can be able to implement the same scenario on windows too.

 

Thanks & Regards,

Santosh

 

 

 

 

 

 

SantoshY_Intel
Moderator
635 Views

Hi,


We haven't heard back from you. Could you please provide an update on your issue?


Thanks & Regards,

santosh


SantoshY_Intel
Moderator
483 Views

Hi,

 

Thank you for your patience. The issue raised by you has been fixed in Intel MPI Library 2021.6 version(Intel HPC Toolkit 2022.2). We tried using the latest Intel MPI 2021.6 and it worked fine as shown in the below screenshot.

SantoshY_Intel_0-1655093848700.png

Please let us know if this resolves your issue.

 

Thanks & Regards,

Santosh

 

SantoshY_Intel
Moderator
447 Views

Hi,


We assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Thanks & Regards,

Santosh


Reply