- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have been testing an electronic structure code on a supercomputer with AMD EPYC 773 CPUs, where the interconnect is made up of HPE slingshots.
We have noticed that calculations running with the default loaded tuning bin file lead to reproducible failures for multiple-node calculations. The mpi debug output states:
[0] MPI startup(): Intel(R) MPI Library, Version 2021.6 Build 20220227 (id: 28877f3f32)
[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.13.2rc1-impi
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
[0] MPI startup(): File "/opt/intel/oneapi/mpi/2021.6.0/etc/tuning_generic_shm-ofi_tcp-ofi-rxm.dat" not found
[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.6.0/etc/tuning_generic_shm-ofi.dat"
We suspected that the underlying problem might be the interconnect and how intel MPI works on AMD platforms.
After some testing, we have found a workaround to the problem, by setting a different tuning bin. The ones that worked for us are:
tuning_generic_ofi_mlx_hcoll.dat
tuning_generic_shm-ofi_mlx_hcoll.dat
I am puzzled why these work for this platform. All the other generic tuning files lead to erroneous calculations.
I know you don't officially support AMD machines, but do you have an idea what the differences between the different tuning files are?
Thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Lion__Konstantin
Please use the latest 2021.12.1. On a slingshot system you should use the CXI provider, but it should get selected automatically.
The contents of the binary file are not public available. We had a couple of fixes for AMD based platforms in our latest releases.
This might be useful:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks!
I'll try the new version sometime next week.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page