Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2248 Discussions

Differences for MPI tuning binaries on AMD EPYC 773

Lion__Konstantin
Beginner
1,056 Views

Hi,

 

I have been testing an electronic structure code on a supercomputer with AMD EPYC 773 CPUs, where the interconnect is made up of HPE slingshots.

 

We have noticed that calculations running with the default loaded tuning bin file lead to reproducible failures for multiple-node calculations. The mpi debug output states:

[0] MPI startup(): Intel(R) MPI Library, Version 2021.6  Build 20220227 (id: 28877f3f32)
[0] MPI startup(): Copyright (C) 2003-2022 Intel Corporation.  All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.13.2rc1-impi
[0] MPI startup(): max number of MPI_Request per vci: 67108864 (pools: 1)
[0] MPI startup(): libfabric provider: tcp;ofi_rxm
[0] MPI startup(): File "/opt/intel/oneapi/mpi/2021.6.0/etc/tuning_generic_shm-ofi_tcp-ofi-rxm.dat" not found
[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/mpi/2021.6.0/etc/tuning_generic_shm-ofi.dat"

We suspected that the underlying problem might be the interconnect and how intel MPI works on AMD platforms.

 

After some testing, we have found a workaround to the problem, by setting a different tuning bin. The ones that worked for us are:


tuning_generic_ofi_mlx_hcoll.dat
tuning_generic_shm-ofi_mlx_hcoll.dat


I am puzzled why these work for this platform. All the other generic tuning files lead to erroneous calculations.

I know you don't officially support AMD machines, but do you have an idea what the differences between the different tuning files are?

 

Thanks!

 

Labels (1)
0 Kudos
2 Replies
TobiasK
Moderator
1,046 Views

@Lion__Konstantin 

Please use the latest 2021.12.1. On a slingshot system you should use the CXI provider, but it should get selected automatically.

The contents of the binary file are not public available. We had a couple of fixes for AMD based platforms in our latest releases.


This might be useful:

https://www.intel.com/content/www/us/en/docs/mpi-library/developer-reference-linux/2021-12/other-environment-variables.html#GUID-6B9D4E5C-8582-42E6-B7DA-72C87622357D

0 Kudos
Lion__Konstantin
Beginner
1,022 Views

Thanks!

I'll try the new version sometime next week.

0 Kudos
Reply