I am benchmarking how much network bandwidth MPI can exploit for my use cases.
I tested the bandwidth between two machines using TCP socket communication and confirmed that they could send/receive 2~3GB of data per sec.
However, I cannot see those numbers from the MPI benchmark results (Intel MPI Benchmark, MPIVCH Benchmark ...) and also in my application.
In my use case, each machine has one MPI process dedicated for the network communication and performs only inter-machine communication.
How can I fully exploit network bandwidth of inter-machine communication?