Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
Annonces
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
1790 Discussions

Slower memory bandwidth on identical nodes reported by STREAM benchmark

Londhe__Ashutosh
Débutant
1 656 Visites

I am running stream benchmark on two identical nodes, but one node is reporting almost 5X slower performance compare to other node

Following is the node configuration

Processor

2 X Intel(R) Xeon(R) CPU E5-2698 v4

2 X 20 Cores, 2.20GHz, L1d cache:            32 K, L1i cache:             32 K, L2 cache:              256 K, L3 cache:              51200 K

Memory

128 GB, 2400 Hz, 4 memory channels (32GB each)

 

Please help me to identify the issue.

I have checked BIOS setting and drivers available, its identical for both nodes.

0 Compliments
4 Réponses
McCalpinJohn
Contributeur émérite III
1 656 Visites

The specific performance numbers might help narrow down the possible mechanisms....

What compiler and compilation options were used?  What is the OS?

I would start by comparing the two systems with a set of smaller tests:

  • Single-thread performance bound to each socket on each node
    • export OMP_NUM_THREADS=1; numactl --membind=0 --cpunodebind=0 ./stream
    • export OMP_NUM_THREADS=1; numactl --membind=1 --cpunodebind=1 ./stream
  • Multi-thread performance bound to each socket on each node, using 2..20 cores.
    • If HyperThreading is enabled, set OMP_PROC_BIND=spread

 

0 Compliments
Londhe__Ashutosh
Débutant
1 655 Visites

Hello John,

Following are the details you asked

compilation: 

gcc -fopenmp -O3 -DSTREAM_ARRAY_SIZE=60000000 stream.c -o Stream_60M.exe

gcc version : 6.2.0

OS: Linux

I will try the experiments you suggested and let you know.

Thanks for the feedback.

 

0 Compliments
Londhe__Ashutosh
Débutant
1 655 Visites

Hello john,

Issue resolved. It was due to faulty PSU which limiting the node performance.

Thanks for your help.

0 Compliments
McCalpinJohn
Contributeur émérite III
1 656 Visites

Glad you found the problem! 

This is an area that often causes problems in our supercomputing environment -- in many cases we would rather have a node fail than have it run slowly.... 

0 Compliments
Répondre