Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

hpl_pairwise failure using clck

Frank_Fu_
Beginner
2,176 Views

Hi,

 

I have 8 nodes, and they are in two groups.

I firstly run health_user test on the 8 nodes, then no issues are detected.

I assume the 8 nodes are identical in the environment.

 

Then I run hpl_cluster_perfomance on each group.

The 1st group reports no issues, while the 2nd group reports "hpl-pairwise-failed".

 

I have attached the database and logs of health_user, hpl test on group 1 and group 2.

Could you give some insight what is the different configuration between the two groups?

How can I get away with the hpl-pairwise-failed?

0 Kudos
7 Replies
AishwaryaCV_Intel
Moderator
2,101 Views

Hi,


Thanks for providing your observations. we are working on your issue and will get back to you soon.


Thanks And Regards,

Aishwarya


0 Kudos
AishwaryaCV_Intel
Moderator
2,034 Views

Hi,

 

Apologies for the delay in my response.

 

Could you please try running following command line to get more information out of the db file:

clckdb -D hpl_cluster_performance.db --provider hpl_pairwise

 

 

And the issue seems to be numerical and not performance. For having more information, run the extended health framework:

 

$ clck -F health_extended_user

 

 

Are you using any virtual nodes?Could you also provide us with the OS and processor details?

NOTE: I would also recommend upgrading on latest oneapi version. Maybe MPI is failing because it uses a KNL definition. 

 

Thanks And Regards,

Aishwarya

 

0 Kudos
AishwaryaCV_Intel
Moderator
1,983 Views

Hi,  


We haven't heard back from you, Could you please provide us the requested details asked in my previous response?


Thank you and best regards, 

Aishwarya



0 Kudos
Frank_Fu_
Beginner
1,939 Views

Hi Aishwarya,

 

Sorry for the late reply.

I am running on RHEL 8.1 on vSphere 8b.

The processor is Intel(R) Xeon(R) Gold 6248R CPU

 

Here is my current version

$ clck -v
Intel(R) Cluster Checker 2021 Update 7 (build 20230112)

 

I have attached the log of the clckdb command and we can find the numerical errors there.

 

I have also run the extended health check and it also reports me the `hpl-pairwise-failed` error.

0 Kudos
AishwaryaCV_Intel
Moderator
1,879 Views

Hi,

 

Could you also please provide output of the following command line:

 

 

$ clck -F health_extended_user -f nodefile

 

 

NOTE: Could you please try it with latest Intel oneAPI version (2023.1.0).

 

Thanks And Regards,

Aishwarya

 

0 Kudos
AishwaryaCV_Intel
Moderator
1,814 Views

Hi,  


We haven't heard back from you, Could you please provide us the requested details asked in my previous response?


Thank you and best regards, 

Aishwarya


0 Kudos
AishwaryaCV_Intel
Moderator
1,722 Views

Hi,

 

We have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.


Thanks And Regards,

Aishwarya


0 Kudos
Reply