Software Archive
Read-only legacy content

CPU Frequency as reported by the Intel linpack

Craig_M_
Beginner
1,048 Views

hi all,

We're running the latest intel linpack on a server that is being reported as running slowly.

The host has 24 cores (HT is off)

processor       : 23
vendor_id       : GenuineIntel
cpu family      : 6
model           : 63
model name      : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
stepping        : 2
microcode       : 54
cpu MHz         : 2494.151
cache size      : 30720 KB
physical id     : 1
siblings        : 12
core id         : 13
cpu cores       : 12
apicid          : 58
initial apicid  : 58
fpu             : yes
fpu_exception   : yes
cpuid level     : 15
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm cqm_llc cqm_occup_llc
bogomips        : 4988.07
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

A linpack run on the idle server gives us unexpected results. While the MFLOP values look reasonable the "CPU FREQUENCY" reported by the test is very low & tends to support the belief that the server is running slowly, yet the linpack test results seem ok.

Does anyone know the CPU Frequency is determined & should we trust it ? If I were to reboot the server it'd report ~3.2GHz

Thanks Craig

===========================================================================================

-bash-4.1$ ./runme_xeon64

This is a SAMPLE run script for SMP LINPACK. Change it to reflect

the correct number of CPUs/threads, problem input files, etc..

Wed Mar 15 08:18:39 EDT 2017

Intel(R) Optimized LINPACK Benchmark data

Current date/time: Wed Mar 15 08:18:39 2017

CPU frequency:    0.363 GHz

Number of CPUs: 2

Number of cores: 24

Number of threads: 24

Parameters are set to:

Number of tests: 15

Number of equations to solve (problem size) : 1000  2000  5000  10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000

Leading dimension of array                  : 1000  2000  5008  10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000

Number of trials to run                     : 4     2     2     2     2     2     2     2     2     2     1     1     1     1     1

Data alignment value (in Kbytes)            : 4     4     4     4     4     4     4     4     4     4     4     1     1     1     1

Maximum memory requested that can be used=16200901024, at the size=45000

=================== Timing linear equation system solver ===================

 

Size   LDA    Align. Time(s)    GFlops   Residual     Residual(norm) Check

1000   1000   4      0.056      11.9393  9.298812e-13 3.171134e-02   pass

1000   1000   4      0.031      21.2377  9.298812e-13 3.171134e-02   pass

1000   1000   4      0.033      20.4730  9.298812e-13 3.171134e-02   pass

1000   1000   4      0.032      20.8035  9.298812e-13 3.171134e-02   pass

2000   2000   4      0.111      47.9817  3.974064e-12 3.456949e-02   pass

2000   2000   4      0.100      53.5876  3.974064e-12 3.456949e-02   pass

5000   5008   4      1.020      81.7258  2.334886e-11 3.255810e-02   pass

5000   5008   4      0.996      83.7464  2.334886e-11 3.255810e-02   pass

10000  10000  4      6.159      108.2831 1.070743e-10 3.775550e-02   pass

10000  10000  4      6.150      108.4350 1.070743e-10 3.775550e-02   pass

15000  15000  4      19.746     113.9718 2.557689e-10 4.028403e-02   pass

15000  15000  4      20.027     112.3692 2.557689e-10 4.028403e-02   pass

18000  18008  4      31.669     122.7896 3.390286e-10 3.712780e-02   pass

18000  18008  4      31.635     122.9233 3.390286e-10 3.712780e-02   pass

20000  20016  4      43.153     123.6089 4.102256e-10 3.631395e-02   pass

20000  20016  4      43.117     123.7132 4.102256e-10 3.631395e-02   pass

22000  22008  4      57.380     123.7298 5.344233e-10 3.914440e-02   pass

22000  22008  4      57.904     122.6101 5.344233e-10 3.914440e-02   pass

25000  25000  4      86.036     121.0880 5.691514e-10 3.236560e-02   pass

25000  25000  4      84.096     123.8807 5.691514e-10 3.236560e-02   pass

26000  26000  4      93.626     125.1643 6.047599e-10 3.180009e-02   pass

26000  26000  4      93.638     125.1486 6.047599e-10 3.180009e-02   pass

27000  27000  4      105.993    123.8140 7.690129e-10 3.750098e-02   pass

30000  30000  1      145.585    123.6510 8.409493e-10 3.315031e-02   pass

35000  35000  1      226.800    126.0399 1.008733e-09 2.928200e-02   pass

40000  40000  1      339.987    125.5046 1.429836e-09 3.180005e-02   pass

45000  45000  1      480.772    126.3676 1.794230e-09 3.156758e-02   pass

Performance Summary (GFlops)

Size   LDA    Align.  Average  Maximal

1000   1000   4       18.6134  21.2377

2000   2000   4       50.7847  53.5876

5000   5008   4       82.7361  83.7464

10000  10000  4       108.3590 108.4350

15000  15000  4       113.1705 113.9718

18000  18008  4       122.8564 122.9233

20000  20016  4       123.6610 123.7132

22000  22008  4       123.1699 123.7298

25000  25000  4       122.4844 123.8807

26000  26000  4       125.1565 125.1643

27000  27000  4       123.8140 123.8140

30000  30000  1       123.6510 123.6510

35000  35000  1       126.0399 126.0399

40000  40000  1       125.5046 125.5046

45000  45000  1       126.3676 126.3676

Residual checks PASSED

End of tests

Done: Wed Mar 15 09:22:43 EDT 2017

0 Kudos
1 Solution
McCalpinJohn
Honored Contributor III
1,048 Views

There is no way to directly obtain the CPU frequency from user-mode code -- all of the registers that contain the information are only accessible from kernel mode.  Unfortunately, the act of making a kernel call is very often enough to cause the frequency to change.

In Linux there are APIs for requesting information about the current frequency, but most of these are not reliable -- either they report a frequency that was requested (but perhaps not actually provided), or they measure the actual cycles over a short interval and compute the corresponding average frequency.  The latter approach is better, but can easily run into problems with dynamic frequency scaling if the interval is too short or if the core being monitored is not running "relevant" code during that interval.

You can measure the average frequency from user space if the performance counters are enabled.  I prefer to use the fixed-function counters for CPU_CYCLES_UNHALTED and REF_CYCLES_UNHALTED, along with measurements of the Time Stamp Counter.  This counters should be called before and after the code that are you interested in measuring, and the code should be pinned to a specific logical processor for the interval.   All of these can be measured inline using the RDPMC instruction, which is a pure hardware instruction that does not do anything that may cause the frequency to change.

  • Fraction of Unhalted time during interval:
    • (REF_CYCLES_UNHALTED(after) - REF_CYCLES_UNHALTED(before)) / (TSC(after)-TSC(before))
  • Average Frequency while not halted:
    • (CPU_CYCLES_UNHALTED(after) - CPU_CYCLES_UNHALTED(before)) / (REF_CYCLES_UNHALTED(after) - REF_CYCLES_UNHALTED(before)) * TSC_Frequency
    • Here the "TSC_Frequency" is the nominal frequency of the processor.

On Linux systems the "perf stat" command will give the average number of cores used and the average frequency for a target workload.  This is often useful, but it is problematic for the Intel xHPL benchmark, which spends a large fraction of its time in single-threaded code doing setup and a large fraction of its time after the calculation in single-threaded code doing validation.   

View solution in original post

0 Kudos
9 Replies
TimP
Honored Contributor III
1,048 Views

As you're not discussing a MIC processor, your post is unlikely to get a suitable answer on this forum.  Perhaps if you posted on the MKL forum , you may get a more meaningful response.  Instantaneous sampling of CPU freq  isn't  meaningful for showing which state you will run under when turbo modes are enables.

0 Kudos
SergeyKostrov
Valued Contributor II
1,048 Views
I address my message to Intel software engineers because it looks like there is a problem with getting a CPU frequency value. Enclosed is win_xeon64.txt file and reported CPU frequency is: ... CPU frequency: 1.105 GHz ... but it should be 2.8 GHz ( ~2.5x less ).. A test was completed with a Linpack release for Windows downloaded from Intel's website.
0 Kudos
McCalpinJohn
Honored Contributor III
1,049 Views

There is no way to directly obtain the CPU frequency from user-mode code -- all of the registers that contain the information are only accessible from kernel mode.  Unfortunately, the act of making a kernel call is very often enough to cause the frequency to change.

In Linux there are APIs for requesting information about the current frequency, but most of these are not reliable -- either they report a frequency that was requested (but perhaps not actually provided), or they measure the actual cycles over a short interval and compute the corresponding average frequency.  The latter approach is better, but can easily run into problems with dynamic frequency scaling if the interval is too short or if the core being monitored is not running "relevant" code during that interval.

You can measure the average frequency from user space if the performance counters are enabled.  I prefer to use the fixed-function counters for CPU_CYCLES_UNHALTED and REF_CYCLES_UNHALTED, along with measurements of the Time Stamp Counter.  This counters should be called before and after the code that are you interested in measuring, and the code should be pinned to a specific logical processor for the interval.   All of these can be measured inline using the RDPMC instruction, which is a pure hardware instruction that does not do anything that may cause the frequency to change.

  • Fraction of Unhalted time during interval:
    • (REF_CYCLES_UNHALTED(after) - REF_CYCLES_UNHALTED(before)) / (TSC(after)-TSC(before))
  • Average Frequency while not halted:
    • (CPU_CYCLES_UNHALTED(after) - CPU_CYCLES_UNHALTED(before)) / (REF_CYCLES_UNHALTED(after) - REF_CYCLES_UNHALTED(before)) * TSC_Frequency
    • Here the "TSC_Frequency" is the nominal frequency of the processor.

On Linux systems the "perf stat" command will give the average number of cores used and the average frequency for a target workload.  This is often useful, but it is problematic for the Intel xHPL benchmark, which spends a large fraction of its time in single-threaded code doing setup and a large fraction of its time after the calculation in single-threaded code doing validation.   

0 Kudos
Craig_M_
Beginner
1,048 Views

Thanks for all the responses & the useful info John.

0 Kudos
SergeyKostrov
Valued Contributor II
1,048 Views
>>There is no way to directly obtain the CPU frequency from user-mode code -- all of the registers that contain the information are >>only accessible from kernel mode... >>... You don't need a kernel mode to get an estimated value of CPU frequency for a CPU on which the current thread is currently running. Only three code lines in C are needed to get that number. Here it is: ... // Sub-Test001.05 - HrtRdtsc ( timing in clock cycles since the last Platform Reset ) { RTuint64 uiClock1 = HrtRdtsc(); CrtSleep( 1000 ); RTuint64 uiClock2 = HrtRdtsc(); CrtPrintf( RTU("HrtRdtsc - [ uiClock2 - uiClock1 ] Elapsed: %.0f clock cycles\n"), ( RTdouble )( uiClock2 - uiClock1 ) ); } ... This is a real piece of codes I've been using for years and you can get very accurate estimates. If a thread's priority is boosted to real-time before a block HrtRdtsc-Sleep-HrtRdtsc is executed accuracy of measurements improves. Why an estimated value? Because OS schedulers will never allow a Sleep( 1000ms ) CRT-function to be executed in exactly 1000ms ( 1s )! This is a non-deterministic nature of Non Real-Time OSs we use when executed a Linpack test. What we've been talking about is that Linpack did Not calculate properly CPU frequency at all and there is no need to write a PHD thesis about it. I've checked it and that is why in a Post #3 I've informed Intel engineers to look how Linpack calculates that value and why reported value is lower in about 2.5 - 3 times. The problem is reproduced on two OSs, that is on Linux and Windows, as you can see.
0 Kudos
McCalpinJohn
Honored Contributor III
1,048 Views

Reading the time stamp counter can only tell you the frequency of the time stamp counter, not the frequency of the core clock.  If that is what you want, you can get the information directly using the Brand ID string from the CPUID instruction, as described in (for example) https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/499128#comment-1837630

On most systems the core clock frequency varies between 1.2 GHz and 3.x GHz, depending on lots of factors.   It seems sort of silly for the xHPL benchmark code to estimate it incorrectly -- it would be far more interesting to measure it during the actual execution of the computational core of the benchmark....

0 Kudos
SergeyKostrov
Valued Contributor II
1,048 Views
>>...On most systems the core clock frequency varies between 1.2 GHz and 3.x GHz, depending on lots of factors... I wonder if you have any real experience with Enchanced Intel SpeedStep Technology. Did you see have it works? It lowers CPU frequency only in one case when No Any HPC related calculations are in progress. I saw how it works on Intel Atom architecture ( Intel Atom CPU N270 @1.60GHz ). >>...It seems sort of silly for the xHPL benchmark code to estimate it incorrectly... We both downloaded recent versions of Linpack from Intel website ( versions for Linux and Windows ) and it looks like there is a bug in codes or something else, that was recently introduced, breaks CPU frequency calculations. OK. I'll do another set of tests and I will monitor from another application how CPU frequency changes. But, when I was doing my tests a fan on my Dell Precision Mobile Workstation was working almost all the time. It means, that all four cores worked at full capacity and they had to be cooled down. Windows Task manager was also used to monitor CPUs load and it was 100% for all four cores. Why wouldn't you verify what we found. It could take just a couple of minutes...
0 Kudos
McCalpinJohn
Honored Contributor III
1,048 Views

Sergey wrote:

I wonder if you have any real experience with Enchanced Intel SpeedStep Technology.

Ummm...  How about "yes".

The actual, instantaneous CPU frequency of Intel processors varies across the full range of available frequencies, depending on the settings of many hardware configuration registers and on the behavior of the software infrastructure (either BIOS or OS) that makes frequency requests to the hardware. 

On the Xeon E5 v3 processors that I am most familiar with, the hardware mostly ignores the OS frequency requests when a processor is idle -- typically dropping the frequency to minimum when the core is put in the "halt" (C1) state.  When the core is brought out of C1 state, the frequency depends on various hardware settings, and the amount of time the processors spends in the C0 state before being brought to full frequency depends on both hardware and software settings.  As an example, systems with BIOS-controlled frequency typically ramp more slowly than systems with OS-controlled frequency, even when both are set to "maximum performance".   

Over "long" periods (seconds or more), the core frequencies are typically what you would expect.  Over "short" periods (milliseconds), performance counter measurements show clearly that the Power Control Unit is manipulating core frequencies in more complex ways, and it is very easy to get low CPU_CYCLES_UNHALTED per second on a core that is in the process of being brought back from a HALT state.

0 Kudos
SergeyKostrov
Valued Contributor II
1,048 Views
>>... I'll do another set of tests and I will monitor from another application how CPU frequency changes... Craig, I've completed a couple of more tests and I think something is wrong with how Linpack calculates CPU frequency. Sometimes it is 10%-25% lower, sometimes it is 25% higher of a nominal value. I didn't have a chance to complete a set of tests for Linux but I've done it for Windows. A test application is attached and it could be easily modified for Linux. Next, it looks like Linpack uses QueryPerformanceFrequency Win32 API function on Windows, and get_cpu_freq on Linux. However, I'd like to bring attention that QueryPerformanceFrequency on Windows doesn't return a value in clock cycles or nanoseconds. It returns in some units called Performance Counter Values ( look at MSDN for more information ). Here is an example of how it looks like: ... QueryPerformanceCounter ( Pcv ) Rdtsc ( Cc ) 2761143000 2827408633 2762779000 2829084584 2762540000 2828840952 2763412000 2829732815 2763158000 2829473142 2763246000 2829562153 2763232000 2829547653 2763188000 2829502861 2763086000 2829397968 2763049000 2829360676 2763324000 2829641543 2763163000 2829476624 2763164000 2829477763 2763152000 2829465561 2763257000 2829572653 2763051000 2829362718 2763089000 2829377752 2763330000 2829648424 2763146000 2829457941 2763350000 2829668422 ... and all these measurements are taken when Linpack was working. An official CPU frequency for my CPU is 2829200000 Hz.
0 Kudos
Reply