MPI Bi-band benchmark shows MPI cannot exploit full duplex transmission mode

seongyun_k_ · ‎01-16-2017

I am running performance benchmark with C++, Intel MPI and CentOS 7.

I use two types of benchmarks:

(1) Uni-band: https://software.intel.com/en-us/node/561907

The first half of ranks communicates with the second half using MPI_Isend/MPI_Recv/MPI_Wait calls. In case of the odd number of processes, one of them does not participate in the message exchange. The bunch of MPI_Isend calls are issued by each rank in the first half of ranks to its counterpart from the second half of ranks.

(2) Bi-band: https://software.intel.com/en-us/node/561908

The first half of ranks communicates with the second half using MPI_Isend/MPI_Recv/MPI_Wait calls. In case of the odd number of processes, one of them does not participate in the message exchange. The bunch of MPI_Isend calls are issued by each rank in the first half of ranks to its counterpart from the second half of ranks, and vice versa.

Since the ethernet on my machines supports full duplex transmission mode, I expected the results of bi-band would show almost two times higher maximum bandwidth than that of uni-band.

(Full-duplex operation doubles the theoretical bandwidth of the connection. If a link normally runs at 1 Mbps but can work in full-duplex mode, it really has 2 Mbps of bandwidth (1 Mbps in each direction).)

However, what I am observing is that even though the ethernet supports full duplex transmission mode, the bandwidth decreases half when there are communications in both directions (Rank-A <----> Rank-B) at the same time.

It seems Intel MPI does not support full duplex transmission mode. Can I resolve it?

Jennifer_D_Intel · ‎02-16-2017

I've moved your question to the Clusters forum so the appropriate experts will see it.

Regards,

Jennifer

McCalpinJohn · ‎02-16-2017

You don't tell us anything about the specific hardware you are using, or the actual performance levels you are seeing, so it is hard to provide specific responses....

Recent (Gbit or faster) Ethernet hardware always runs in full-duplex mode to avoid collisions.

For unidirectional transfers, it should be possible to exceed 90% of the peak unidirectional bandwidth without requiring any magic. Using the iperf3 benchmark, I can easily generate 900 Mbs of traffic between two clients equipped with Gbit Ethernet adapters, even with two inexpensive switches between the two clients. With 10Gbit Ethernet, you may need to enable Jumbo frames to get close to full bandwidth - especially if your CPUs are not very fast. Using MPI over IP adds more potential for performance mistakes, but I would expect Intel MPI to be well implemented on Ethernet.

The protocol overhead of unidirectional transfers with standard frame sizes is only a few percent, but that cannot be directly applied to the bidirectional case. For example, in a unidirectional transfer from A to B, any traffic from B to A (such as Ethernet acknowledgments, MPI flow control, etc), does not interfere with the data transfer (since it is on the wires going in the other direction). When performing simultaneous transfers in both directions, this extra traffic competes with data transfers and must be included in the analysis. I don't work much with Ethernet, but for InfiniBand we typically see bidirectional traffic running at a throughput of about 1.5-1.6 times the maximum unidirectional rates.

I just looked for published bidirectional Ethernet results and did not find any recent publications with convincing methodologies for MPI, but it is certainly possible that I did not look hard enough.

Mikhail_S_Intel · ‎02-14-2018

Hello Seongyun,

Could you please provide more information about hardware and software (CPU, IMPI version) and the command line you use for uniband/biband launches?