Slower memory bandwidth on identical nodes reported by STREAM benchmark

Londhe__Ashutosh · ‎12-11-2019

I am running stream benchmark on two identical nodes, but one node is reporting almost 5X slower performance compare to other node

Following is the node configuration

Processor

2 X Intel(R) Xeon(R) CPU E5-2698 v4

2 X 20 Cores, 2.20GHz, L1d cache: 32 K, L1i cache: 32 K, L2 cache: 256 K, L3 cache: 51200 K

Memory

128 GB, 2400 Hz, 4 memory channels (32GB each)

Please help me to identify the issue.

I have checked BIOS setting and drivers available, its identical for both nodes.

McCalpinJohn · ‎12-12-2019

The specific performance numbers might help narrow down the possible mechanisms....

What compiler and compilation options were used? What is the OS?

I would start by comparing the two systems with a set of smaller tests:

Single-thread performance bound to each socket on each node
- export OMP_NUM_THREADS=1; numactl --membind=0 --cpunodebind=0 ./stream
- export OMP_NUM_THREADS=1; numactl --membind=1 --cpunodebind=1 ./stream
Multi-thread performance bound to each socket on each node, using 2..20 cores.
- If HyperThreading is enabled, set OMP_PROC_BIND=spread

Londhe__Ashutosh · ‎12-12-2019

Hello John,

Following are the details you asked

compilation:

gcc -fopenmp -O3 -DSTREAM_ARRAY_SIZE=60000000 stream.c -o Stream_60M.exe

gcc version : 6.2.0

OS: Linux

I will try the experiments you suggested and let you know.

Thanks for the feedback.

Londhe__Ashutosh · ‎12-13-2019

Hello john,

Issue resolved. It was due to faulty PSU which limiting the node performance.

Thanks for your help.

McCalpinJohn · ‎12-13-2019

Glad you found the problem!

This is an area that often causes problems in our supercomputing environment -- in many cases we would rather have a node fail than have it run slowly....