I am running stream benchmark on two identical nodes, but one node is reporting almost 5X slower performance compare to other node
Following is the node configuration
2 X Intel(R) Xeon(R) CPU E5-2698 v4
2 X 20 Cores, 2.20GHz, L1d cache: 32 K, L1i cache: 32 K, L2 cache: 256 K, L3 cache: 51200 K
128 GB, 2400 Hz, 4 memory channels (32GB each)
Please help me to identify the issue.
I have checked BIOS setting and drivers available, its identical for both nodes.
The specific performance numbers might help narrow down the possible mechanisms....
What compiler and compilation options were used? What is the OS?
I would start by comparing the two systems with a set of smaller tests:
- Single-thread performance bound to each socket on each node
- export OMP_NUM_THREADS=1; numactl --membind=0 --cpunodebind=0 ./stream
- export OMP_NUM_THREADS=1; numactl --membind=1 --cpunodebind=1 ./stream
- Multi-thread performance bound to each socket on each node, using 2..20 cores.
- If HyperThreading is enabled, set OMP_PROC_BIND=spread
Following are the details you asked
gcc -fopenmp -O3 -DSTREAM_ARRAY_SIZE=60000000 stream.c -o Stream_60M.exe
gcc version : 6.2.0
I will try the experiments you suggested and let you know.
Thanks for the feedback.
Glad you found the problem!
This is an area that often causes problems in our supercomputing environment -- in many cases we would rather have a node fail than have it run slowly....