Community
cancel
Showing results for 
Search instead for 
Did you mean: 
seongyun_k_
Beginner
51 Views

MPI performance problem on inter-switch connection

Hi,

I have a cluster with 32 machines. The first 25 machines are on the first rack and the rest 7 machines are on the second rack.
Each rack has a 1Gbps Ethernet switch.

I run a MPI application which uses 32 machines (1 process per host machine).
When I used the network performance benchmark tool like 'iperf' to measure the network speed between the machines, there is no problem (all point-to-point connection within 32 machines can exploit the full bandwidth).

In my application (MPI_Send/MPI_Recv), each mpi process sends a few 4MB sized data to the other machines. (so it is not the message size problem)
I found that the communication speed between the first 25 machines and the next 7 machines was very poor (~ 10 ~ 20 MB/sec)
(The communication speed within the first 25 machines and the next 7 machines are fast; 100 ~ 110 MB/sec)

 

What is the possible cause here? Is the latency killing it?
What can I do here to improve the performance?

Is there any suggested optimization?

0 Kudos
1 Reply
seongyun_k_
Beginner
51 Views

It turns out that the problem is link contention between the two switches.

Reply