- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I have a problem with the result of MKL MP_Linkpack. In my system, I have 24 compute nodes with both Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz and Xeon Phi Q7200, RAM 256GB. On each node, I run ./runme_intel64, the performance is good ~ 700-900 GFlops (only Xeon CPU).
But when I run HPL on 4 nodes, 8 nodes or more, the result is very bad, sometimes it cannot return the result with the error: MPI TERMINATED,... After that, I run the test (runme_intel64) on each node again, and the performance is very low:
~ 11,243 GFLops,
~ 10,845 GFlops,
....
But I don't know the reason why, I guess the reason is the power of cluster (it is not enough for a whole system) and HPE Bios configured is Balanced Mode for the cluster (automatically change to lower power mode when the system cannot get enough the power). But when I just run on some nodes and configure the power is maximum, the problem is still not solved.
Please help me about this problem, thank you all!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Minh
In my opinion, if the problem was in the power, then OS will be send like "Power Throttle" in /var/log/messages. Some servers send like this such message when you take out the second power supply.
if one node linpack work fine then (I think) low performance may be in some situations:
- wrong P Q in HPL.dat
- problems with interconnect
- low mesh use in HPL.dat. Low memory usage. It will be not less 85% of summary memory of all nodes
For max performance you need setup in BIOS and /proc/cpu_freq - "max performance" and
for c in ./cpu[0-9]* ; do echo $maxFreq >${c}/cpufreq/scaling_max_freq echo $maxFreq >${c}/cpufreq/scaling_min_freq done

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page