- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hi all,
We're running the latest intel linpack on a server that is being reported as running slowly.
The host has 24 cores (HT is off)
processor : 23
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
stepping : 2
microcode : 54
cpu MHz : 2494.151
cache size : 30720 KB
physical id : 1
siblings : 12
core id : 13
cpu cores : 12
apicid : 58
initial apicid : 58
fpu : yes
fpu_exception : yes
cpuid level : 15
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm cqm_llc cqm_occup_llc
bogomips : 4988.07
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
A linpack run on the idle server gives us unexpected results. While the MFLOP values look reasonable the "CPU FREQUENCY" reported by the test is very low & tends to support the belief that the server is running slowly, yet the linpack test results seem ok.
Does anyone know the CPU Frequency is determined & should we trust it ? If I were to reboot the server it'd report ~3.2GHz
Thanks Craig
===========================================================================================
-bash-4.1$ ./runme_xeon64
This is a SAMPLE run script for SMP LINPACK. Change it to reflect
the correct number of CPUs/threads, problem input files, etc..
Wed Mar 15 08:18:39 EDT 2017
Intel(R) Optimized LINPACK Benchmark data
Current date/time: Wed Mar 15 08:18:39 2017
CPU frequency: 0.363 GHz
Number of CPUs: 2
Number of cores: 24
Number of threads: 24
Parameters are set to:
Number of tests: 15
Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1
Maximum memory requested that can be used=16200901024, at the size=45000
=================== Timing linear equation system solver ===================
Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
1000 1000 4 0.056 11.9393 9.298812e-13 3.171134e-02 pass
1000 1000 4 0.031 21.2377 9.298812e-13 3.171134e-02 pass
1000 1000 4 0.033 20.4730 9.298812e-13 3.171134e-02 pass
1000 1000 4 0.032 20.8035 9.298812e-13 3.171134e-02 pass
2000 2000 4 0.111 47.9817 3.974064e-12 3.456949e-02 pass
2000 2000 4 0.100 53.5876 3.974064e-12 3.456949e-02 pass
5000 5008 4 1.020 81.7258 2.334886e-11 3.255810e-02 pass
5000 5008 4 0.996 83.7464 2.334886e-11 3.255810e-02 pass
10000 10000 4 6.159 108.2831 1.070743e-10 3.775550e-02 pass
10000 10000 4 6.150 108.4350 1.070743e-10 3.775550e-02 pass
15000 15000 4 19.746 113.9718 2.557689e-10 4.028403e-02 pass
15000 15000 4 20.027 112.3692 2.557689e-10 4.028403e-02 pass
18000 18008 4 31.669 122.7896 3.390286e-10 3.712780e-02 pass
18000 18008 4 31.635 122.9233 3.390286e-10 3.712780e-02 pass
20000 20016 4 43.153 123.6089 4.102256e-10 3.631395e-02 pass
20000 20016 4 43.117 123.7132 4.102256e-10 3.631395e-02 pass
22000 22008 4 57.380 123.7298 5.344233e-10 3.914440e-02 pass
22000 22008 4 57.904 122.6101 5.344233e-10 3.914440e-02 pass
25000 25000 4 86.036 121.0880 5.691514e-10 3.236560e-02 pass
25000 25000 4 84.096 123.8807 5.691514e-10 3.236560e-02 pass
26000 26000 4 93.626 125.1643 6.047599e-10 3.180009e-02 pass
26000 26000 4 93.638 125.1486 6.047599e-10 3.180009e-02 pass
27000 27000 4 105.993 123.8140 7.690129e-10 3.750098e-02 pass
30000 30000 1 145.585 123.6510 8.409493e-10 3.315031e-02 pass
35000 35000 1 226.800 126.0399 1.008733e-09 2.928200e-02 pass
40000 40000 1 339.987 125.5046 1.429836e-09 3.180005e-02 pass
45000 45000 1 480.772 126.3676 1.794230e-09 3.156758e-02 pass
Performance Summary (GFlops)
Size LDA Align. Average Maximal
1000 1000 4 18.6134 21.2377
2000 2000 4 50.7847 53.5876
5000 5008 4 82.7361 83.7464
10000 10000 4 108.3590 108.4350
15000 15000 4 113.1705 113.9718
18000 18008 4 122.8564 122.9233
20000 20016 4 123.6610 123.7132
22000 22008 4 123.1699 123.7298
25000 25000 4 122.4844 123.8807
26000 26000 4 125.1565 125.1643
27000 27000 4 123.8140 123.8140
30000 30000 1 123.6510 123.6510
35000 35000 1 126.0399 126.0399
40000 40000 1 125.5046 125.5046
45000 45000 1 126.3676 126.3676
Residual checks PASSED
End of tests
Done: Wed Mar 15 09:22:43 EDT 2017
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is no way to directly obtain the CPU frequency from user-mode code -- all of the registers that contain the information are only accessible from kernel mode. Unfortunately, the act of making a kernel call is very often enough to cause the frequency to change.
In Linux there are APIs for requesting information about the current frequency, but most of these are not reliable -- either they report a frequency that was requested (but perhaps not actually provided), or they measure the actual cycles over a short interval and compute the corresponding average frequency. The latter approach is better, but can easily run into problems with dynamic frequency scaling if the interval is too short or if the core being monitored is not running "relevant" code during that interval.
You can measure the average frequency from user space if the performance counters are enabled. I prefer to use the fixed-function counters for CPU_CYCLES_UNHALTED and REF_CYCLES_UNHALTED, along with measurements of the Time Stamp Counter. This counters should be called before and after the code that are you interested in measuring, and the code should be pinned to a specific logical processor for the interval. All of these can be measured inline using the RDPMC instruction, which is a pure hardware instruction that does not do anything that may cause the frequency to change.
- Fraction of Unhalted time during interval:
- (REF_CYCLES_UNHALTED(after) - REF_CYCLES_UNHALTED(before)) / (TSC(after)-TSC(before))
- Average Frequency while not halted:
- (CPU_CYCLES_UNHALTED(after) - CPU_CYCLES_UNHALTED(before)) / (REF_CYCLES_UNHALTED(after) - REF_CYCLES_UNHALTED(before)) * TSC_Frequency
- Here the "TSC_Frequency" is the nominal frequency of the processor.
On Linux systems the "perf stat" command will give the average number of cores used and the average frequency for a target workload. This is often useful, but it is problematic for the Intel xHPL benchmark, which spends a large fraction of its time in single-threaded code doing setup and a large fraction of its time after the calculation in single-threaded code doing validation.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As you're not discussing a MIC processor, your post is unlikely to get a suitable answer on this forum. Perhaps if you posted on the MKL forum , you may get a more meaningful response. Instantaneous sampling of CPU freq isn't meaningful for showing which state you will run under when turbo modes are enables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is no way to directly obtain the CPU frequency from user-mode code -- all of the registers that contain the information are only accessible from kernel mode. Unfortunately, the act of making a kernel call is very often enough to cause the frequency to change.
In Linux there are APIs for requesting information about the current frequency, but most of these are not reliable -- either they report a frequency that was requested (but perhaps not actually provided), or they measure the actual cycles over a short interval and compute the corresponding average frequency. The latter approach is better, but can easily run into problems with dynamic frequency scaling if the interval is too short or if the core being monitored is not running "relevant" code during that interval.
You can measure the average frequency from user space if the performance counters are enabled. I prefer to use the fixed-function counters for CPU_CYCLES_UNHALTED and REF_CYCLES_UNHALTED, along with measurements of the Time Stamp Counter. This counters should be called before and after the code that are you interested in measuring, and the code should be pinned to a specific logical processor for the interval. All of these can be measured inline using the RDPMC instruction, which is a pure hardware instruction that does not do anything that may cause the frequency to change.
- Fraction of Unhalted time during interval:
- (REF_CYCLES_UNHALTED(after) - REF_CYCLES_UNHALTED(before)) / (TSC(after)-TSC(before))
- Average Frequency while not halted:
- (CPU_CYCLES_UNHALTED(after) - CPU_CYCLES_UNHALTED(before)) / (REF_CYCLES_UNHALTED(after) - REF_CYCLES_UNHALTED(before)) * TSC_Frequency
- Here the "TSC_Frequency" is the nominal frequency of the processor.
On Linux systems the "perf stat" command will give the average number of cores used and the average frequency for a target workload. This is often useful, but it is problematic for the Intel xHPL benchmark, which spends a large fraction of its time in single-threaded code doing setup and a large fraction of its time after the calculation in single-threaded code doing validation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for all the responses & the useful info John.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Reading the time stamp counter can only tell you the frequency of the time stamp counter, not the frequency of the core clock. If that is what you want, you can get the information directly using the Brand ID string from the CPUID instruction, as described in (for example) https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/499128#comment-1837630
On most systems the core clock frequency varies between 1.2 GHz and 3.x GHz, depending on lots of factors. It seems sort of silly for the xHPL benchmark code to estimate it incorrectly -- it would be far more interesting to measure it during the actual execution of the computational core of the benchmark....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergey wrote:
I wonder if you have any real experience with Enchanced Intel SpeedStep Technology.
Ummm... How about "yes".
The actual, instantaneous CPU frequency of Intel processors varies across the full range of available frequencies, depending on the settings of many hardware configuration registers and on the behavior of the software infrastructure (either BIOS or OS) that makes frequency requests to the hardware.
On the Xeon E5 v3 processors that I am most familiar with, the hardware mostly ignores the OS frequency requests when a processor is idle -- typically dropping the frequency to minimum when the core is put in the "halt" (C1) state. When the core is brought out of C1 state, the frequency depends on various hardware settings, and the amount of time the processors spends in the C0 state before being brought to full frequency depends on both hardware and software settings. As an example, systems with BIOS-controlled frequency typically ramp more slowly than systems with OS-controlled frequency, even when both are set to "maximum performance".
Over "long" periods (seconds or more), the core frequencies are typically what you would expect. Over "short" periods (milliseconds), performance counter measurements show clearly that the Power Control Unit is manipulating core frequencies in more complex ways, and it is very easy to get low CPU_CYCLES_UNHALTED per second on a core that is in the process of being brought back from a HALT state.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page