Software Tuning, Performance Optimization & Platform Monitoring
Discussion around monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform monitoring

30 cores much more slower than 32 cores at the same total cores

GHui
Novice
202 Views

I have an mpi job running on Gold 6142 with slurm. It takes 7680 cores total.

MPI Version: intelmpi-2017.2.174

TEST A: To take 256 nodes, and 30 cores per node. And every node has 32 cores. It takes 7245.24463 seconds to compute the job.

TEST B: To take 240 nodes, and 32 cores per node. And every node has 32 cores. It takes 11760.76172 seconds to compute the job.

I confused that why TEST B is much slower than TEST A.

 

=========

Infiniband Version:

perfquery BUILD VERSION: 1.6.7.MLNX20171022.b127f52 Build date: Oct 29 2017 13:03:53

[root@ ~]#ibstatus 
Infiniband device 'mlx5_0' port 1 status:
default gid: fe80:0000:0000:0000:46e3:e861:1f20:ba60
base lid: 0x40a
sm lid: 0x3a9
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 100 Gb/sec (4X EDR)
link_layer: InfiniBand
[root@ ~]#ibstat
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.18.0226
Hardware version: 0
Node GUID: 0x46e3e8611f20ba60
System image GUID: 0x46e3e8611f20ba60
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 1034
LMC: 0
SM lid: 937
Capability mask: 0x2651e848
Port GUID: 0x46e3e8611f20ba60
Link layer: InfiniBand
[root@ ~]#
 
 
 
0 Kudos
3 Replies
TimP
Black Belt
202 Views
I would disagree with your implication that you have drawn a general conclusion which doesn't depend on any details of your application or CPU choices.
GHui
Novice
202 Views

I found that TEST B IB speed much slower than TEST A IB speed.

GHui
Novice
202 Views

I test grapes_globale.

time_step_max = 5760

Reply