- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am running stream benchmark on two identical nodes, but one node is reporting almost 5X slower performance compare to other node
Following is the node configuration
Processor
2 X Intel(R) Xeon(R) CPU E5-2698 v4
2 X 20 Cores, 2.20GHz, L1d cache: 32 K, L1i cache: 32 K, L2 cache: 256 K, L3 cache: 51200 K
Memory
128 GB, 2400 Hz, 4 memory channels (32GB each)
Please help me to identify the issue.
I have checked BIOS setting and drivers available, its identical for both nodes.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The specific performance numbers might help narrow down the possible mechanisms....
What compiler and compilation options were used? What is the OS?
I would start by comparing the two systems with a set of smaller tests:
- Single-thread performance bound to each socket on each node
- export OMP_NUM_THREADS=1; numactl --membind=0 --cpunodebind=0 ./stream
- export OMP_NUM_THREADS=1; numactl --membind=1 --cpunodebind=1 ./stream
- Multi-thread performance bound to each socket on each node, using 2..20 cores.
- If HyperThreading is enabled, set OMP_PROC_BIND=spread
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello John,
Following are the details you asked
compilation:
gcc -fopenmp -O3 -DSTREAM_ARRAY_SIZE=60000000 stream.c -o Stream_60M.exe
gcc version : 6.2.0
OS: Linux
I will try the experiments you suggested and let you know.
Thanks for the feedback.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello john,
Issue resolved. It was due to faulty PSU which limiting the node performance.
Thanks for your help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Glad you found the problem!
This is an area that often causes problems in our supercomputing environment -- in many cases we would rather have a node fail than have it run slowly....

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page