Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
1960 Discussions

MPI Performance issue on bi-directional communication



(I attached the performance measurement program written in C++)

I am experiencing performance issue during bi-directional MPI_Send/MPI_Recv operations.

The program runs two threads; (One for MPI_Send and the other for MPI_Recv).
- MPI_Recv receives any data from any source.
- MPI_Send sends data to the other nodes one at a time (starting from its own rank, rank+1, ..., 0, ... rank -1)

You can compile the attached file as follows:
$ mpiicpc -O3 -m64 -std=c++11 -mt_mpi -qopenmp ./mpi-test.cpp -o mpi-test

You can test it as follows:
$ mpiexec.hydra -genv I_MPI_PERHOST 1 -genv I_MPI_FABRICS tcp -n 2 -machinefile ./machine_list /home/TESTER/mpi-test
rank[0] --> rank[0]     BW=2060.27 [MB/sec]
rank[0] --> rank[1]     BW=56.38 [MB/sec]
rank[0] BW=219.21 [MB/sec]
rank[1] BW=217.20 [MB/sec]

$ mpiexec.hydra -genv I_MPI_PERHOST 1 -genv I_MPI_FABRICS tcp -n 4 -machinefile ./machine_list /home/TESTER/mpi-test
rank[0] --> rank[0]     BW=2050.59 [MB/sec]
rank[0] --> rank[1]     BW=112.35 [MB/sec]
rank[0] --> rank[2]     BW=57.19 [MB/sec]
rank[0] --> rank[3]     BW=109.64 [MB/sec]
rank[0] BW=218.28 [MB/sec]
rank[1] BW=219.17 [MB/sec]
rank[2] BW=220.75 [MB/sec]
rank[3] BW=221.17 [MB/sec]

What I am observing is that when the data transfer from rank-A to rank-B and from rank-B to rank-A occur simultaneously, the performance drops significantly (almost to half).
The cluster machines use Cent OS 7, 1gbps ethernet that supports full duplex transimission mode.

How can I resolve this issue?

- Does Intel MPI support full-duplex transmission mode between two ranks?

0 Kudos
1 Reply


The related benchmark is Multi-Biband benchmark (

Why can't it fully exploit the network bandwidth available? (Why only half of it?) even with full duplex transmission mode?