Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2154 Discussions

mpiexec.hydra of intel2018/2021 not starting at RHEL8.6

B_J
Beginner
1,914 Views

After upgrade of OS as RHEL8.6, we found that mpiexec.hydra of older intel MPI is not working. 
Actually it doesn't start at all -  `mpiexec.hydra -n 2 hostname`  just hangs. Note that I didn't use MPI application. 

We see this hang at intel mpi 2018, 2021 while intel mpi 2022 is working OK.

We need to use older intel compiler due to legacy support, and would there be any way to make older version of mpiexec.hydra work?

 

Any comments are appreciated. 

 

B.

Labels (1)
0 Kudos
8 Replies
SantoshY_Intel
Moderator
1,884 Views

Hi,


Thank you for posting in Intel Communities.


Thanks for reporting this issue. We were able to reproduce it and we have informed the development team about it.


Thanks & Regards,

Santosh


0 Kudos
B_J
Beginner
1,806 Views

Hi Santosh,

 

I would like to ask if you have received any update from the development team?

We're still struggling to make mpiexec.hydra work while we are not successful yet.

Any comments are appreciated.

 

Best regards,

 

B.

0 Kudos
CyfronetHPCTeam
Beginner
1,769 Views

Hello,

We got the same problem after updating EL8 kernels on our cluster.

Problem is related to a new way of reporting file size for:
/sys/devices/system/node/node*/cpulist
It used to be 4096, after the update applied in kernel 4.18.0-359.el8 it reports 0.

This causes mpiexec (hydra_proxy) to go into the infinite loop 

Problematic commit is here
* Mon Jan 10 2022 Augusto Caringi <acaringi@redhat.com> [4.18.0-359.el8] - drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI (Phil Auld) [1920645]

We have applied the kernel patch to revert to an old behavior:

https://lore.kernel.org/lkml/CAGsJ_4yb5Z3msMgXRZpSXLFiysQdJq-n_p9B6d-p2t_-_UHhVQ@mail.gmail.com/T/#u

and rebuilt the kernel rpms. Reverting to an older kernel is also an option if you don't care for security issues very much (i.e well isolated cluster)

I hope that helps

Best Regards


0 Kudos
B_J
Beginner
1,747 Views

Hi,

 

Thank you for sharing the detailed information.

Using your information, we contacted RHEL support for this but looks like we don't have clear solution at this moment - using customized kernel might not be allowed.

We are planning to re-install 8.4 from scratch for now.
Will update community if there is any patch from RHEL side.

 

Best regards,

 

B.

0 Kudos
B_J
Beginner
1,298 Views

Hi,

 

We reached RHEL support, and were told that it will be fixed at RHEL8.7

 

Today we updated the kernel as 8.7, and confirmed that intel mpi 2018 works OK now.

/sys/devices/system/node/node*/cpulist contains the info of the number of cpus, and intel MPI works OK.

 

Just FYI.

 

B.

0 Kudos
SantoshY_Intel
Moderator
1,786 Views

Hi,


The Intel Developers are still working on your issue. I will update you in the community forum if there is any update regarding your issue.


Thanks & Regards,

Santosh


0 Kudos
SantoshY_Intel
Moderator
1,356 Views

Hi,


Thanks for your patience. To know about the root cause & workaround for your issue, please refer to the below article:

Intel® MPI Library Hang Issue when Using RHEL 8.6

Moreover, the issue got fixed in the Intel MPI 2021.7. If you want to use the prior versions, then please use the workaround mentioned in the above article.


If this resolves your issue, make sure to accept this as a solution. This would help others with similar issues. Thank you! 


Thanks & Regards,

Santosh


0 Kudos
SantoshY_Intel
Moderator
1,334 Views

Hi,


I assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.


Thanks & Regards,

Santosh


0 Kudos
Reply