Performance issue with multi-rail support in Intel MPI 5.0

Simon_H_2 · ‎12-17-2014

Hi,

I am experiencing a severy performance loss when using multiple rails in Intel MPI 5.0 and the KNC and an mlx5 adapter (which has 2 ports). With Intel MPI 4.1 it was much better.

Let me give an example of the performance of our application (per KNC):

Intel MPI 4.1, single-rail (I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx5_0-1u): 220 Gflop/s
Intel MPI 4.1, dual-rail (-IB I_MPI_OFA_ADAPTER_NAME=mlx5_0 I_MPI_OFA_NUM_PORTS=2): 270 Gflop/s
Intel MPI 5.0, single-rail (I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx5_0-1u): 220 Gflop/s
Intel MPI 5.0, dual-rail (-IB I_MPI_OFA_ADAPTER_NAME=mlx5_0 I_MPI_OFA_NUM_PORTS=2): 150 Gflop/s
Intel MPI 5.0, single-rail (-IB I_MPI_OFA_ADAPTER_NAME=mlx5_0 I_MPI_OFA_NUM_PORTS=1): 150 Gflop/s

With DAPL the performance is unchanged, but apparently there is no way to use it with dual-rail support. With OFA I got the best performance in v4.1, but with v5.0 it is extremely low. In particular it is the same for 1 or 2 ports.

Is there anything I am overlooking in the documentation?

Thanks,

Simon

Loc_N_Intel · ‎12-19-2014

The HCA card in my system has one port only, I cannot reproduce the dual-rail issue that you saw. But let me ask experts around here and get back to you. Thank you.

Loc_N_Intel · ‎12-19-2014

Hi Simon,

I just contacted an expert. Could you describe in details how to reproduce the issue, please? Thank you.

Simon_H_2 · ‎12-21-2014

Hi,

thanks for your reply. To reproduce, you can use, for example, the OSU bandwidth benchmark: http://mvapich.cse.ohio-state.edu/benchmarks/. My original tests were done on the KNC, but the same problem shows up on the Xeon (Haswell) host.

You can see the result in the attached figure. You can see that for message sizes around 100 kB and above Intel MPI 4.1 with "dual rail" is by far the best (blue solid squares). Intel MPI 5.0 is much much worse.

444821

Simon_H_2 · ‎01-18-2015

Are there any new developments on this? Can you confirm the issue now? Thanks, Simon

Loc_N_Intel · ‎01-21-2015

Hi Simon,

Sorry for the delayed answer. This issue was forwarded to the development team for investigation. I will let you know when I have an update.

Thank you.

Simon_H_2 · ‎03-04-2015

Hi, Are there any updates regarding this issue? Thanks, Simon

Artem_R_Intel1 · ‎03-10-2015

Hi Simon,

Could you please specify the exact versions of Intel MPI Library (4.x, 5.x) and OS/MPSS/OFED/DAPL.
Also could you please provide test scenarios you used. Which compute nodes were involved in each run (MPI ranks only on HOST, or only on KNC, or both on HOST and KNC).

Regarding to DAPL - try to run the same scenarios with default DAPL provider (without I_MPI_DAPL_PROVIDER_LIST).

Simon_H_2 · ‎03-17-2015

Hi Artem,

I used two scenarios, the issue shows up in both cases:

HOST <-> HOST
KNC <-> KNC

Versions:

Intel MPI 4.1.3.045 and 5.0.2.044
OS is Linux (CentOS)
OFED 3.5.2
DAPL 2.1.2
MPSS 3.3.3 (I guess this is irrelevant, since the issue shows up also if only HOSTS are involved)

I think I had tried in the past to run without I_MPI_DAPL_PROVIDER_LIST, but Intel MPI tried to default to an mlx4 device (which does not exist on our system), and would not use the mlx5 device, so using I_MPI_DAPL_PROVIDER_LIST was mandatory. I will try again.

Remark: Should this topic be moved to the general forum, since by now we know that it is not MIC-specific?

Simon

Loc_N_Intel · ‎03-27-2015

Hi Simon,

FYI, we submitted an internal bug report (DPD200368369) a while ago but there is no update yet.

Gergana_S_Intel · ‎03-30-2015

Hi Simon,

Loc and I have been communicating internally about this since you initially submitted it.

Just as an FYI, I'm moving this issue over to the regular Intel® Clusters and HPC Technology forum since it's not Phi-specific. That way I can keep track of the internal bug I submitted and update you on current status.

Thanks,
~Gergana

Gergana_S_Intel · ‎11-23-2015

Hey Simon,

We've made several fixes to the Intel MPI Library in regards to multi-rail support. Just wondering if you've tried the latest Intel MPI 5.1.2 with any better success regarding performance?

~Gergana