Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
50 Views

30 cores much more slower than 32 cores at the same total cores

I have an mpi job running on Gold 6142 with slurm. It takes 7680 cores total.

MPI Version: intelmpi-2017.2.174

TEST A: To take 256 nodes, and 30 cores per node. And every node has 32 cores. It takes 7245.24463 seconds to compute the job.

TEST B: To take 240 nodes, and 32 cores per node. And every node has 32 cores. It takes 11760.76172 seconds to compute the job.

I confused that why TEST B is much slower than TEST A.

 

=========

Infiniband Version:

perfquery BUILD VERSION: 1.6.7.MLNX20171022.b127f52 Build date: Oct 29 2017 13:03:53

[root@ ~]#ibstatus 
Infiniband device 'mlx5_0' port 1 status:
default gid: fe80:0000:0000:0000:46e3:e861:1f20:ba60
base lid: 0x40a
sm lid: 0x3a9
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 100 Gb/sec (4X EDR)
link_layer: InfiniBand
[root@ ~]#ibstat
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.18.0226
Hardware version: 0
Node GUID: 0x46e3e8611f20ba60
System image GUID: 0x46e3e8611f20ba60
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 1034
LMC: 0
SM lid: 937
Capability mask: 0x2651e848
Port GUID: 0x46e3e8611f20ba60
Link layer: InfiniBand
[root@ ~]#
 
 
 
0 Kudos
3 Replies
Highlighted
Black Belt
50 Views

I would disagree with your implication that you have drawn a general conclusion which doesn't depend on any details of your application or CPU choices.
0 Kudos
Highlighted
Beginner
50 Views

I found that TEST B IB speed much slower than TEST A IB speed.

0 Kudos
Highlighted
Beginner
50 Views

I test grapes_globale.

time_step_max = 5760

0 Kudos