Software Tuning, Performance Optimization & Platform Monitoring
Discussion around monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform monitoring
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
For the latest information on Intel’s response to the Log4j/Log4Shell vulnerability, please see Intel-SA-00646

Get Very Low Performance with MP Linpack benchmark in HPC cluster


Dear all,

I have a problem with the result of MKL MP_Linkpack. In my system, I have 24 compute nodes with both Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz and Xeon Phi Q7200, RAM 256GB. On each node, I run ./runme_intel64, the performance is good ~ 700-900 GFlops (only Xeon CPU).

But when I run HPL on 4 nodes, 8 nodes or more, the result is very bad, sometimes it cannot return the result with the error: MPI TERMINATED,... After that, I run the test (runme_intel64) on each node again, and the performance is very low:

~ 11,243 GFLops,

~ 10,845 GFlops,


But I don't know the reason why, I guess the reason is the power of cluster (it is not enough for a whole system) and HPE Bios configured is Balanced Mode for the cluster (automatically change to lower power mode when the system cannot get enough the power). But when I just run on some nodes and configure the power is maximum, the problem is still not solved.

Please help me about this problem, thank you all!

0 Kudos
1 Reply

Hi Minh

In my opinion, if the problem was in the power, then OS will be send like "Power Throttle" in /var/log/messages. Some servers send like this such message when you take out the second power supply.

if one node linpack work fine then (I think) low performance may be in some situations:

- wrong P Q in HPL.dat

- problems with interconnect

- low mesh use in HPL.dat. Low memory usage. It will be not less 85% of summary memory of all nodes

For max performance you need setup in BIOS and /proc/cpu_freq - "max performance" and

for c in ./cpu[0-9]* ; do
  echo $maxFreq >${c}/cpufreq/scaling_max_freq
  echo $maxFreq >${c}/cpufreq/scaling_min_freq