Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

hydra_bstrap_pmi crash

msafdari
Novice
1,601 Views

I have two programs A and B. Both MPI-enabled. A is started with mpiexec.exe and B is called through a system call by A. The system call simply runs "mpiexec.exe B args ..." . Once system call is issued, hydra_bstrap_pmi (called through mpiexec) starts and quickly crashes. The condition only happens in Windows and if A is called with mpiexec. Unfortunately bootstrap code does not output any debug message regardless of setting proper envs. Is there any way to output some error information so that I can debug the issue. Also, I am using Intel MPI 2021.3 library.

Labels (1)
0 Kudos
8 Replies
SantoshY_Intel
Moderator
1,586 Views

Hi,

 

Thanks for reaching out to us.

 

Could you please provide us with a sample reproducer code for programs A and B to investigate your issue from our end?

 

>>"Is there any way to output some error information so that I can debug the issue"

You can use the below command to get the debug information:

mpiexec -v -genv I_MPI_DEBUG=5 -genv FI_LOG_LEVEL=debug sample.exe

 

Thanks & Regards,

Santosh

 

0 Kudos
msafdari
Novice
1,575 Views

Thanks Santosh for providing this response.

I need to see if I can reproduce this with some minimal code, but it will be hard. Regarding the ENV you suggested, still no message/output is generated by mpiexec, hydra_bstrap_proxy (crashing binary), and hydra_pmi_proxy. Is there any other ENV to set? Here are those set currently (all will -genv):

I_MPI_DEBUG=10

I_MPI_DEBUG_OUTPUT=stdout

I_MPI_HYDRA_DEBUG=on

plus I am passing "-v  -genv FI_LOG_LEVEL=debug" to mpiexec.exe during the launch.

0 Kudos
SantoshY_Intel
Moderator
1,555 Views

Hi,

 

We can see from your previous response, that you have used all the available options to get the debug information while launching an MPI job using the mpiexec compiler.

 

And, there are no other options (or) environment variables for getting the debug/error information.

 

It will be helpful for us if you could provide a sample reproducer code for further investigation from our end.

 

Thanks & regards,

Santosh

 

 

 

0 Kudos
msafdari
Novice
1,547 Views

Thanks Santosh for confirming the ENVs.

I was wondering if multiple level MPI of this type (i.e. master rank of program A starts program B through a system call using mpiexec.exe args....) is guaranteed to work with Intel MPI 2021 on Windows or it is considered non-standard? 

 

Thanks

0 Kudos
SantoshY_Intel
Moderator
1,493 Views

hi,

 

We tried to implement your scenario where a master rank of program A starts program B through a system call using "mpiexec.exe args" using the below-attached sample codes(a.c & b.c).

 

Please let us know if our sample code doesn't reflect your use case.

 

Below are the steps that we followed using the latest Intel MPI Library 2021.5 on a windows machine:

  1. mpiicc a.c 
  2. mpiicc b.c
  3. mpiexec -n 4 a.exe

We have encountered an error as shown in the below screenshot:

err_syscall.png

 

We had a similar issue which has been reported to the concerned development team. This issue will be fixed in Intel oneAPI future releases.

 

however, the same scenario works fine on a Linux machine as shown in the screenshot below:

worked_syscall.png

 

>>" is guaranteed to work with Intel MPI 2021 on Windows or it is considered non-standard? "

Once the issue is fixed in future releases, then you can be able to implement the same scenario on windows too.

 

Thanks & Regards,

Santosh

 

 

 

 

 

 

0 Kudos
SantoshY_Intel
Moderator
1,441 Views

Hi,


We haven't heard back from you. Could you please provide an update on your issue?


Thanks & Regards,

santosh


0 Kudos
SantoshY_Intel
Moderator
1,289 Views

Hi,

 

Thank you for your patience. The issue raised by you has been fixed in Intel MPI Library 2021.6 version(Intel HPC Toolkit 2022.2). We tried using the latest Intel MPI 2021.6 and it worked fine as shown in the below screenshot.

SantoshY_Intel_0-1655093848700.png

Please let us know if this resolves your issue.

 

Thanks & Regards,

Santosh

 

0 Kudos
SantoshY_Intel
Moderator
1,253 Views

Hi,


We assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Thanks & Regards,

Santosh


0 Kudos
Reply