- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello all,
We are preparing a switch from Intel-MPI 2018 to version 2021.* (currently tested: 2021.8), for use in conjunction with Mellanox Infiniband interconnect.
Although our testcases are running well, mpirun would display the following message at startup:
[0#36769:36769@plgrocalv905] MPI startup(): File "/test/mpi/intel-2021.8.0/mpi/2021.8.0/etc/tuning_skx_shm-ofi_mlx_56.dat" not found
[0#36769:36769@plgrocalv905] MPI startup(): Load tuning file: "/test/mpi/intel-2021.8.0/mpi/2021.8.0/etc/tuning_skx_shm-ofi.dat"
I assume this is linked to Infinibandsupport - Do you have any idea which incidence this warning might have on the computations?
As it does not look to be part of standard Intel-MPI packaging, does Intel otherwise maintain any "tuning_skx_shm-ofi_mlx*.dat" file or should this be checked with IB vendor?
Thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in the Intel communities.
We are able to reproduce your issue from our end. We are working on your issue and we will get back to you soon.
Thanks & Regards,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please follow the below steps and get back to us if you face any issues?
- Initialize the oneAPI environment using the command:
source /opt/intel/oneAPI/setvars.sh - Now run the command which lists the available FI_PROVIDERS.
Command: fi_info -l - Workaround 1: If "psm3" is available in the output of "fi_info -l", then run the below command as a workaround:
FI_PROVIDER=psm3 I_MPI_DEBUG=10 mpirun -bootstrap ssh -n 2 IMB-MPI1 allreduce - Workaround 2: set $FI_PROVIDER_PATH to use the providers of some other version of IMPI with known working libfabric. For example, I am currently using IMPI 2021.8 in which the issue is reproducible. I found that IMPI 2021.4 version libfabric is working, so I used the following command to get the FI_PROVIDER_PATH of IMPI 2021.4:
echo $FI_PROVIDER_PATH
/global/panfs01/admin/opt/intel/oneAPI/2021.4.0.3347/mpi/2021.4.0//libfabric/lib/prov:/usr/lib64/libfabric
So, export this path to FI_PROVIDER_PATH of IMPI 2021.8 as shown below:
export FI_PROVIDER_PATH=/global/panfs01/admin/opt/intel/oneAPI/2021.4.0.3347/mpi/2021.4.0//libfabric/lib/prov:/usr/lib64/libfabric
I_MPI_DEBUG=10 mpirun -bootstrap ssh -n 2 IMB-MPI1 allreduce
Please find the complete debug logs in the attachments.
Thanks,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Santosh,
Many thanks for your feedback!
Actually it seems that psm3 is not supported on our cluster (only psm2)...
Moreover it seems that only the last version of Intel-MPI could be downloaded
Would there be a way to recover offline installer for Intel-MPi 2021.4?
I would be greatly interested to test workaround #2.
Thanks again - Have a nice day
Olivier
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
By the way, it seems that there are indeed some mlx tuning files:
tuning_icx_shm-ofi_mlx_100.dat
tuning_icx_shm-ofi_mlx.dat
What is the difference between "icx" and "skx" tuning files?
Is there a way (environment variable?) to tell IMPI to consider icx tuning files instead of skx taken as default?
Additional precision: I have been able to recover 2021.3 and 2021.6, but still face the same issue
So still interested to give a chance to versino 2021.4!
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
>>>"Actually it seems that psm3 is not supported on our cluster (only psm2)..."
You can try experimenting with other fabric providers listed in "fi_info -l" command & check if you can resolve the issue by changing the FI_PROVIDER=psm2/tcp/verbs etc..
>>>"What is the difference between "icx" and "skx" tuning files?"
"icx" is specific to the Ice Lake machines whereas "skx" is specific to Skylake machines. If you have an Icelake machine, then by default it will try to load tuning_icx*.dat tuning file. similarly, if you have a skylake machine, then by default it will try to load tuning_skx*.dat tuning file.
>>>"Is there a way (environment variable?) to tell IMPI to consider icx tuning files instead of skx taken as default?"
We can use the environment variable "I_MPI_TUNING_BIN" to change the default tuning file.
For example:
export I_MPI_TUNING_BIN=/opt/intel/oneAPI/2023.0.0.25537/mpi/2021.8.0/etc/tuning_icx_shm-ofi_mlx_100.dat |
For more information, please refer to the Intel® MPI Library Tuning Files article.
>>>"Additional precision: I have been able to recover 2021.3 and 2021.6, but still face the same issue"
Since workaround 2 worked at my end, could you please check if you followed the below steps from your end?
- Please check with Intel MPI 2021.3 if the tuning file can be found in the I_MPI_DEBUG log while running the MPI program.
- If it works(i.e issue is not reproduced using IMPI 2021.3), then run the command: echo $FI_PROVIDER_PATH
- now, Initialize IMPI 2021.8 & export the output of the above command to FI_PROVIDER_PATH of Intel MPI 2021.8 version.
- If you still face the same issue, then please provide us with the complete steps and debug log from your end. We followed the above steps & we were able to get the desired output.
Thanks & Regards,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Santosh,
Thanks for your feedback - Indeed the issue would remain at my end with version 2021.3 - Here is the output:
[0] MPI startup(): Intel(R) MPI Library, Version 2021.3 Build 20210601 (id: 6f90181f1)
[0] MPI startup(): Copyright (C) 2003-2021 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[2] MPI startup(): shm segment size (1612 MB per rank) * (2 local ranks) = 3224 MB total
[0] MPI startup(): shm segment size (1883 MB per rank) * (2 local ranks) = 3767 MB total
[0] MPI startup(): libfabric version: 1.12.1-impi
[0] MPI startup(): libfabric provider: mlx
[0] MPI startup(): File "/test/mpi/intel-2021.3.0/mpi/2021.3.0/etc/tuning_skx_shm-ofi_mlx.dat" not found
[0] MPI startup(): Load tuning file: "/test/mpi/intel-2021.3.0/mpi/2021.3.0/etc/tuning_skx_shm-ofi.dat"
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 20982 hpcv905 {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}
[0] MPI startup(): 1 20983 hpcv905 {16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}
[0] MPI startup(): 2 27147 hpcv904 {0,1,2,3,4,5,6,7,8,9,10,11}
[0] MPI startup(): 3 0 v904
[0] MPI startup(): I_MPI_ROOT=/test/mpi/intel-2021.3.0/mpi/2021.3.0
[0] MPI startup(): I_MPI_MPIRUN=mpirun
[0] MPI startup(): I_MPI_HYDRA_RMK=lsf
[0] MPI startup(): I_MPI_HYDRA_TOPOLIB=hwloc
[0] MPI startup(): I_MPI_INTERNAL_MEM_POLICY=default
[0] MPI startup(): I_MPI_DEBUG=10
I am a bit concerned to test all providers manually - I would like to make sure that the Infiniband device would be taken into account in a reliable way...
I will give a try with I_MPI_TUNING_BIN - Actually I am wondering how this would behave on a cluster with different types of computation hosts (Icelake, Cascadelake, Skylake...).
Thanks again - Have a nice day.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
>>>"I will give a try with I_MPI_TUNING_BIN "
Could you please provide us with an update on your issue?
Thanks & Regards,
Santosh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Santosh,
I have performed some tests with different tuning files:
- tuning_skx_shm-ofi.dat (default configuration)
- tuning_generic_shm-ofi_mlx_hcoll.dat
- tuning_icx_shm-ofi_mlx.dat
- tuning_skx_shm-ofi_psm3.dat
Unfortunately this did not give any significant performance gap...
In the meantime we have also made some performance tests with different version of Intel-MPi 2021, and found that 2021.6 was almost 10% quicker than 2021.8 - which similar behavior vs. tuning files though...
So no real convincing results at that stage unfortunately - From you previous statement, do you know if there would be any way to recover version 2021.4 standalone package?
Any idea to test & tune further more than appreciated - As a whole we can still not reach the same level of performance with IMPI 2021 vs. 2018 (which unfortunately does not seem to support last harware configurations...)
Thanks - Have a nice day.
Olivier
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
>>>"do you know if there would be any way to recover the version 2021.4 standalone package?"
Could you please go through the query "Download an older version of oneAPI toolkits" for more details on how to get older versions of oneAPI.
Thanks for reporting this issue. We have redirected this issue to the concerned development team. They are working on your issue.
Thanks & Regards,
Santosh
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page