- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am experiencing a severy performance loss when using multiple rails in Intel MPI 5.0 and the KNC and an mlx5 adapter (which has 2 ports). With Intel MPI 4.1 it was much better.
Let me give an example of the performance of our application (per KNC):
- Intel MPI 4.1, single-rail (I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx5_0-1u): 220 Gflop/s
- Intel MPI 4.1, dual-rail (-IB I_MPI_OFA_ADAPTER_NAME=mlx5_0 I_MPI_OFA_NUM_PORTS=2): 270 Gflop/s
- Intel MPI 5.0, single-rail (I_MPI_DAPL_PROVIDER_LIST=ofa-v2-mlx5_0-1u): 220 Gflop/s
- Intel MPI 5.0, dual-rail (-IB I_MPI_OFA_ADAPTER_NAME=mlx5_0 I_MPI_OFA_NUM_PORTS=2): 150 Gflop/s
- Intel MPI 5.0, single-rail (-IB I_MPI_OFA_ADAPTER_NAME=mlx5_0 I_MPI_OFA_NUM_PORTS=1): 150 Gflop/s
With DAPL the performance is unchanged, but apparently there is no way to use it with dual-rail support. With OFA I got the best performance in v4.1, but with v5.0 it is extremely low. In particular it is the same for 1 or 2 ports.
Is there anything I am overlooking in the documentation?
Thanks,
Simon
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The HCA card in my system has one port only, I cannot reproduce the dual-rail issue that you saw. But let me ask experts around here and get back to you. Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Simon,
I just contacted an expert. Could you describe in details how to reproduce the issue, please? Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
thanks for your reply. To reproduce, you can use, for example, the OSU bandwidth benchmark: http://mvapich.cse.ohio-state.edu/benchmarks/. My original tests were done on the KNC, but the same problem shows up on the Xeon (Haswell) host.
You can see the result in the attached figure. You can see that for message sizes around 100 kB and above Intel MPI 4.1 with "dual rail" is by far the best (blue solid squares). Intel MPI 5.0 is much much worse.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Simon,
Sorry for the delayed answer. This issue was forwarded to the development team for investigation. I will let you know when I have an update.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Simon,
Could you please specify the exact versions of Intel MPI Library (4.x, 5.x) and OS/MPSS/OFED/DAPL.
Also could you please provide test scenarios you used. Which compute nodes were involved in each run (MPI ranks only on HOST, or only on KNC, or both on HOST and KNC).
Regarding to DAPL - try to run the same scenarios with default DAPL provider (without I_MPI_DAPL_PROVIDER_LIST).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Artem,
I used two scenarios, the issue shows up in both cases:
- HOST <-> HOST
- KNC <-> KNC
Versions:
- Intel MPI 4.1.3.045 and 5.0.2.044
- OS is Linux (CentOS)
- OFED 3.5.2
- DAPL 2.1.2
- MPSS 3.3.3 (I guess this is irrelevant, since the issue shows up also if only HOSTS are involved)
I think I had tried in the past to run without I_MPI_DAPL_PROVIDER_LIST, but Intel MPI tried to default to an mlx4 device (which does not exist on our system), and would not use the mlx5 device, so using I_MPI_DAPL_PROVIDER_LIST was mandatory. I will try again.
Remark: Should this topic be moved to the general forum, since by now we know that it is not MIC-specific?
Simon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Simon,
FYI, we submitted an internal bug report (DPD200368369) a while ago but there is no update yet.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Simon,
Loc and I have been communicating internally about this since you initially submitted it.
Just as an FYI, I'm moving this issue over to the regular Intel® Clusters and HPC Technology forum since it's not Phi-specific. That way I can keep track of the internal bug I submitted and update you on current status.
Thanks,
~Gergana
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Simon,
We've made several fixes to the Intel MPI Library in regards to multi-rail support. Just wondering if you've tried the latest Intel MPI 5.1.2 with any better success regarding performance?
~Gergana

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page