- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have 8 nodes, and they are in two groups.
I firstly run health_user test on the 8 nodes, then no issues are detected.
I assume the 8 nodes are identical in the environment.
Then I run hpl_cluster_perfomance on each group.
The 1st group reports no issues, while the 2nd group reports "hpl-pairwise-failed".
I have attached the database and logs of health_user, hpl test on group 1 and group 2.
Could you give some insight what is the different configuration between the two groups?
How can I get away with the hpl-pairwise-failed?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for providing your observations. we are working on your issue and will get back to you soon.
Thanks And Regards,
Aishwarya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Apologies for the delay in my response.
Could you please try running following command line to get more information out of the db file:
clckdb -D hpl_cluster_performance.db --provider hpl_pairwise
And the issue seems to be numerical and not performance. For having more information, run the extended health framework:
$ clck -F health_extended_user
Are you using any virtual nodes?Could you also provide us with the OS and processor details?
NOTE: I would also recommend upgrading on latest oneapi version. Maybe MPI is failing because it uses a KNL definition.
Thanks And Regards,
Aishwarya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We haven't heard back from you, Could you please provide us the requested details asked in my previous response?
Thank you and best regards,
Aishwarya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Aishwarya,
Sorry for the late reply.
I am running on RHEL 8.1 on vSphere 8b.
The processor is Intel(R) Xeon(R) Gold 6248R CPU
Here is my current version
$ clck -v
Intel(R) Cluster Checker 2021 Update 7 (build 20230112)
I have attached the log of the clckdb command and we can find the numerical errors there.
I have also run the extended health check and it also reports me the `hpl-pairwise-failed` error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you also please provide output of the following command line:
$ clck -F health_extended_user -f nodefile
NOTE: Could you please try it with latest Intel oneAPI version (2023.1.0).
Thanks And Regards,
Aishwarya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We haven't heard back from you, Could you please provide us the requested details asked in my previous response?
Thank you and best regards,
Aishwarya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.
Thanks And Regards,
Aishwarya

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page