Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2153 Discussions

Performance issue with multi-rail support in Intel MPI 5.0

Simon_H_2
Beginner
1,306 Views

Hi,

I am experiencing a severy performance loss when using multiple rails in Intel MPI 5.0 and the KNC and an mlx5 adapter (which has 2 ports). With Intel MPI 4.1 it was much better.

Let me give an example of the performance of our application (per KNC):

  • Intel MPI 4.1, single-rail (I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx5_0-1u): 220 Gflop/s
  • Intel MPI 4.1, dual-rail (-IB I_MPI_OFA_ADAPTER_NAME=mlx5_0  I_MPI_OFA_NUM_PORTS=2): 270 Gflop/s
  • Intel MPI 5.0, single-rail (I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx5_0-1u): 220 Gflop/s
  • Intel MPI 5.0, dual-rail (-IB I_MPI_OFA_ADAPTER_NAME=mlx5_0  I_MPI_OFA_NUM_PORTS=2): 150 Gflop/s
  • Intel MPI 5.0, single-rail (-IB I_MPI_OFA_ADAPTER_NAME=mlx5_0  I_MPI_OFA_NUM_PORTS=1): 150 Gflop/s

With DAPL the performance is unchanged, but apparently there is no way to use it with dual-rail support. With OFA I got the best performance in v4.1, but with v5.0 it is extremely low. In particular it is the same for 1 or 2 ports.

Is there anything I am overlooking in the documentation?

Thanks,

Simon

0 Kudos
11 Replies
Loc_N_Intel
Employee
1,306 Views

The HCA card in my system has one port only, I cannot reproduce the dual-rail issue that you saw. But let me ask experts around here and get back to you. Thank you.

0 Kudos
Loc_N_Intel
Employee
1,306 Views

Hi Simon,

I just contacted an expert. Could you describe in details how to reproduce the issue, please? Thank you.

0 Kudos
Simon_H_2
Beginner
1,306 Views

Hi,

thanks for your reply. To reproduce, you can use, for example, the OSU bandwidth benchmark: http://mvapich.cse.ohio-state.edu/benchmarks/. My original tests were done on the KNC, but the same problem shows up on the Xeon (Haswell) host.

You can see the result in the attached figure. You can see that for message sizes around 100 kB and above Intel MPI 4.1 with "dual rail" is by far the best (blue solid squares). Intel MPI 5.0 is much much worse.

444821

 

0 Kudos
Simon_H_2
Beginner
1,306 Views
Are there any new developments on this? Can you confirm the issue now? Thanks, Simon
0 Kudos
Loc_N_Intel
Employee
1,306 Views

Hi Simon,

Sorry for the delayed answer. This issue was forwarded to the development team for investigation. I will let you know when I have an update.

Thank you. 

0 Kudos
Simon_H_2
Beginner
1,306 Views
Hi, Are there any updates regarding this issue? Thanks, Simon
0 Kudos
Artem_R_Intel1
Employee
1,306 Views

Hi Simon,

Could you please specify the exact versions of Intel MPI Library (4.x, 5.x) and OS/MPSS/OFED/DAPL.
Also could you please provide test scenarios you used. Which compute nodes were involved in each run (MPI ranks only on HOST, or only on KNC, or both on HOST and KNC).

Regarding to DAPL - try to run the same scenarios with default DAPL provider (without I_MPI_DAPL_PROVIDER_LIST).

0 Kudos
Simon_H_2
Beginner
1,306 Views

Hi Artem,

I used two scenarios, the issue shows up in both cases:

  1. HOST <-> HOST
  2. KNC <-> KNC

Versions:

  • Intel MPI 4.1.3.045 and 5.0.2.044
  • OS is Linux (CentOS)
  • OFED 3.5.2
  • DAPL 2.1.2
  • MPSS 3.3.3 (I guess this is irrelevant, since the issue shows up also if only HOSTS are involved)

I think I had tried in the past to run without I_MPI_DAPL_PROVIDER_LIST, but Intel MPI tried to default to an mlx4 device (which does not exist on our system), and would not use the mlx5 device, so using I_MPI_DAPL_PROVIDER_LIST was mandatory. I will try again.

Remark: Should this topic be moved to the general forum, since by now we know that it is not MIC-specific?

Simon

0 Kudos
Loc_N_Intel
Employee
1,306 Views

Hi Simon,

FYI, we submitted an internal bug report (DPD200368369) a while ago but there is no update yet. 

0 Kudos
Gergana_S_Intel
Employee
1,306 Views

Hi Simon,

Loc and I have been communicating internally about this since you initially submitted it.

Just as an FYI, I'm moving this issue over to the regular Intel® Clusters and HPC Technology forum since it's not Phi-specific.  That way I can keep track of the internal bug I submitted and update you on current status.

Thanks,
~Gergana

0 Kudos
Gergana_S_Intel
Employee
1,306 Views

Hey Simon,

We've made several fixes to the Intel MPI Library in regards to multi-rail support.  Just wondering if you've tried the latest Intel MPI 5.1.2 with any better success regarding performance?

~Gergana

0 Kudos
Reply