With Intel MPI-2019.6.166 (IPSXE 2020.0.166, Mellanox HDR, MLNX_OFED_LINUX-4.7-220.127.116.11) getting 2.5x slower performance, compared to another cluster with Intel MPI 2019.1 (Mellanox EDR, MLNX_OFED_LINUX-5.0-18.104.22.168).
I'm suspecting that Intel MPI-2019.6.166 is not picking right IB transport. What values from below need to be set for UCX_TLS env variable in mpiexec?
$ ucx_info -d | grep Transport
# Transport: posix
# Transport: sysv
# Transport: self
# Transport: tcp
# Transport: tcp
# Transport: rc
# Transport: rc_mlx5
# Transport: dc_mlx5
# Transport: ud
# Transport: ud_mlx5
# Transport: cm
# Transport: cma
# Transport: knem
Also, OpenMPI has env var UCX_NET_DEVICES=mlx5_0:1 to set what IB interface to use. Please let me know similar variable for Intel MPI-2020.
CA type: MT4123
Could you please share the benchmark program/code that you are using to compare MPI-2019.6.166 with MPI 2019.1.
For selecting the transport we suggest you go through the following link
Regarding UCX_NET_DEVICES in Intel MPI, we will get back to you.
The Intel MPI uses UCX in the backend for Infiniband. The UCX commands are not specific for OpenMPI.
Also regarding the slower performance of IMPI 2019u6 could you once check the performance after changing the provider to verbs.
We have also made some improvements with mlx in 2019u7. If possible can you upgrade and check with the latest version and see if the performance improves.
The performance issue is resolved now. Issue may be with firmware or infiniband drivers. We tested performance with IntelMPI-2018u5 and IntelMPI-2019u6. IntelMPI-2018u5 is slightly faster than IntelMPI-2019u6.
Glad to hear that your issue has been resolved.
We suggest using the latest version of Intel MPI (2019u7) instead of IMPI 2018u5.
Shall we close this thread considering your issue has been resolved?