Server Products
Data Center Products including boards, integrated systems, RAID Storage, and Intel® Xeon® Processors
5080 Discussions

Questions about surprising results with intel_cpufreq governor

SimonHF
Beginner
820 Views

Hello!

I have a Sapphire Rapids system with dual "Intel(R) Xeon(R) Platinum 8480+" processors running Linux / Debian. When I first discovered these issues then I posted them here [1] in the Debian forum. But I have received no replies and since the Debian frequency governor used is "intel_cpufreq" and AI suggests that the governor is written by Intel, I am also opening this ticket here

Please look at the original posting here [1] which also has graphs. However, a summary of the issue is: I run the same set of load tests twice, with SMT enabled and disabled in the BIOS. So one set of tests ran on 112 threads and the other ran on 224 threads. The load test is a home grown script which loads the system to an arbitrary amount of CPU per thread. I tested at 5% to 100% in steps of 5%. The script also calculated hashes during it's load -- which I called "work done" -- and I could graph the work done per second by all threads. While the load tests were running then I also run Intel pcm [2] to monitor the frequencies (presumably determined by the intel_cpufreq governor?) and also produced graphs showing the average frequency (instead of work done) per second.


There were a bunch of surprising results (to me) and I'm hoping experts here can chime in, explain what happened, and offer advice:

Surprising result 1: The total amount of "work done" per second at 100% CPU of 224 threads (experiment 1) is surprisingly similar to the total amount of "work done" per second at 100% CPU of 112 threads (experiment 1). This suggests that concurrent counterpart hyper threads end up slowing each other down by ~ 50%, or? So is there any advantage to running the system with SMT hyper threads enabled? Doesn't it just help make the latency of code less predictable because you likely never really know when concurrent counterpart hyper threads will be slowing each other down? Unless the host is always working at 100% in which case they will always be slowing each other down? And there appears to be no "work done" advantage at 100% CPU, and maybe a few percent at 50% CPU?

Surprising result 2: For experiment 1, if Intel Ark says the frequency range is 2.0 to 3.8 GHz, why is the effective average GHz per second in the range 2.77 to 2.99 GHz regardless of whether the CPU load is 5% or 100%? I was using Intel pcm [2] to report the individual core frequencies for each second. And the graph shows the average of all those frequencies. But individual core frequencies never moved out of the range 2.77 to 2.99 GHz either. Presumably the core frequencies get changed / adjusted very often per second, and so could it be that cores were hitting 3.8 GHz for a fraction of a second but the overall frequency for the second still never went above 2.99 GHz? Also when trying out the experiment variations, I could never get the frequency to go above 3.0 GHz according to Intel pcm.

Surprising result 3: With experiment 1, it was expected that as the CPU load gets higher, the average core frequency gets lower. So 5% CPU load results in up to 2.99 GHz, and 100% CPU load results in down to 2.77 GHz. However, with experiment 2, the relationship became reversed! As the CPU load gets higher, the average core frequency also got higher! So 5% CPU results in < 2.75 GHz frequency, and 100% CPU load results in up to 2.99 GHz! How can this be?

Surprising result 4: Naively I would expect to see higher frequencies for experiment 2. Why? Because presumably using half the hyper threads means using less power which means less heat which means higher frequency possible? However, below is the last second of the Intel pcm per second report for the 100% CPU for experiments 1 & 2 runs. We can see 100% CPU in the UTIL column. The CFREQ column is showing the opposite frequencies (as noted in surprising result 3); lower for experiment 1 and higher for experiment 2. We can also see the average temperatures for the sockets, and it's higher for experiment 2, which is the opposite of what we might naively expect, or?

 Core (SKT) | UTIL | IPC  | CFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3MPI | L2MPI |   L3OCC |   LMB  |   RMB  | TEMP

 SKT    0     1.00   2.29    2.75     601 K   3138 K    0.78    0.93  0.0000  0.0000    97440      153       85     21 <-- 224 / 224 HTs at 100% CPU load
 SKT    1     1.00   2.29    2.80    1334 K   4346 K    0.67    0.88  0.0000  0.0000   104720      277      181     18

 SKT    0     1.00   4.35    2.98      71 K    742 K    0.89    0.72  0.0000  0.0000    97104        8        0     27 <-- 112 / 112 HTs at 100% CPU load
 SKT    1     1.00   4.34    2.99      85 K   1378 K    0.93    0.68  0.0000  0.0000   107968        5        3     22


Thanks in advance!

 

[1] https://forums.debian.net/viewtopic.php?t=162072

[2] https://github.com/intel/pcm

 

0 Kudos
12 Replies
Sazirah
Employee
799 Views

Hi SimonHF,


Thank you for posting in Intel Community Forum.


First and foremost we would like to apologize regarding the previous issue posted. Please know that our main objective is to provide you the better and correct resolution for the issue reported. Regarding this issue, please spare us some time while we are checking on this at our end. We will get back to you with any update soon.


Appreciate your cooperation on this.


Regards,

Sazzy_Intel

Intel Customer Support Technician


0 Kudos
Subhashish
Employee
736 Views

Hello SimonHF,


Greetings !!



This is regarding the ongoing issue reported in our end. In order to further assist you with this, please help confirm below details:


  1. Are you using latest BIOS and microcode updates for Sapphire Rapids?
  2. Have you tested with a different CPU governor to compare results?
  3. Is your workload memory-bound or compute-bound?


Looking forward to hear from you.




Regards,

Subhashish_Intel.


0 Kudos
SimonHF
Beginner
709 Views

Are you using latest BIOS and microcode updates for Sapphire Rapids?

I'm pretty sure, yes, because these hosts are quite new. But I will discover the exact versions and get back to you. The hosts themselves are from Dell, so the updates come from Dell.

 


Have you tested with a different CPU governor to compare results?

I haven't tested with the available governors: "conservative", "ondemand", "powersave", and "schedutil". Do you think I should?

I have tested with the available governor: "userspace".

I was interested if there are any circumstances where I could get Intel pcm to report a core frequency above 3.0 GHz...

I set the governor and frequency like this:

$ sudo cpupower frequency-set -g userspace

$ sudo cpupower frequency-info
analyzing CPU 0:
  driver: intel_cpufreq
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 20.0 us
  hardware limits: 800 MHz - 3.80 GHz
  available cpufreq governors: conservative ondemand userspace powersave performance schedutil
  current policy: frequency should be within 800 MHz and 3.80 GHz.
                  The governor "userspace" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 2.96 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes

$ sudo cpupower frequency-set -u 3800Mhz
$ sudo cpupower frequency-set -d 3800Mhz
$ sudo cpupower frequency-set -f 3800Mhz

What I discovered is that if I set the frequency below 3.0 GHz then the frequency would be respected, e.g. 2.5 GHz. But if I set the frequency above 3.0 GHz, e.g. 3.1 GHz, then during load testing Intel pcm would still report a max frequency of 2.99 GHz.


Is your workload memory-bound or compute-bound?

The workload is compute-bound [1] and is a simpler script which uses the following algorithm:

  • For each wall-clock second, for the first x% of the second, calculate SHA256 hashes, and sleep the rest of the second.
  • At the end of the second report how many hashes were calculated.
  • Repeat until running for y seconds.

In this way I can launch e.g. 224 processes, all concurrently generating x% CPU load, for y seconds, on the host which has 224 hardware threads.

At the end of the test I can see also the total "work done" in terms of hashes, and e.g. the "work done" in any particular second across all processes or for a particular process.

By running Intel pcm at the same time, I can track the core frequencies during a test, and ask questions e.g. how does the frequency change when loading the host to 5% CPU usage vs 100% CPU usage, etc.

[1] Only local memory is used, so all memory accesses should be cached, and no remote memory should be used.

0 Kudos
Subhashish
Employee
678 Views

Hello SimonHF,


Thank you for sharing these information. We are reviewing and working on your request and we will update you again on this soon.


Kindly await our next response.



Regards,

Subhashish_Intel.


0 Kudos
Ragulan_Intel
Employee
663 Views

Hello SimonHF,


Greetings!


First and foremost, please be informed that since the issue involves many components, we will make our best effort to provide the support you need. We understand that resolving complex issues can be challenging, and we are committed to assisting you throughout the process.


To proceed further with this case, please share your complete system details, such as the model, Linux version, and kernel version. Having this information will help us better understand your setup and identify potential solutions more effectively.


If you have any additional information or specific concerns related to the issue, please feel free to include those as well. The more details you provide, the better we can assist you.


Thank You & Best Regards,


Ragulan_Intel


0 Kudos
SimonHF
Beginner
640 Views

@Ragulan_Intel wrote:

To proceed further with this case, please share your complete system details, such as the model, Linux version, and kernel version. Having this information will help us better understand your setup and identify potential solutions more effectively.

The model is a Dell r760, and the CPU is mentioned above: 'Sapphire Rapids system with dual "Intel(R) Xeon(R) Platinum 8480+" processors'.

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 12 (bookworm)
Release:	12
Codename:	bookworm

$ uname -a
Linux <host name> 6.8.12-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-4 (2024-11-06T15:04Z) x86_64 GNU/Linux

 

0 Kudos
Subhashish
Employee
630 Views

Hello SimonHF,


Thank you for sharing the details. We are reviewing this and we will get back to you again. Kindly await our next communication.



Regards,

Subhashish_Intel.


0 Kudos
Subhashish
Employee
590 Views

Hello SimonHF,



Thank you for your patience thus far. Upon further reviewing we see that this is not an issue directly with the processor. As we are from processor support team, we can assist with the processor failure. For this concerns of yours, the correct point of contact would be our Intel community. I have shared the link below, please drop in your query and concerned team will definitely get back to you:


https://community.intel.com/t5/Software-Tuning-Performance/bd-p/software-tuning-perf-optimization


If you find any hardware failure with the CPU itself or any other error / issue in top of this then we can check and if any hardware failure found, can help you with RMA.



Regards,

Subhashish_Intel.


0 Kudos
SimonHF
Beginner
587 Views

Hello Subhashish_Intel,

Thanks for the update and redirect.

So the "Software Tuning, Performance Optimization & Platform Monitoring" community forum is responsible for the "intel_cpufreq" governor and related questions?

--

Simon

0 Kudos
SimonHF
Beginner
584 Views

For continuity for readers who get here, I posted the question again in the suggested community forum here [1].

[1] https://community.intel.com/t5/Software-Tuning-Performance/Questions-about-surprising-results-with-intel-cpufreq-governor/m-p/1678118#M8524

0 Kudos
Sazirah
Employee
581 Views

Hi SimonHF,


Thank you for your reply.


Yes, the team is expert in the Intel Tool issue that reported. Seems like you have posted in the specific forum, therefore kindly give them some time to get back to you. Also, since there is nothing much we can assist you on this case, we will proceed with closing this case at our end. If you have any question related to Xeon in the future, kindly contact us back and we will be happy to assist you.


Thank you for using Intel products and services.


Regards,

Sazzy_Intel

Intel Customer Support Technician


0 Kudos
SimonHF
Beginner
580 Views

Please ensure that this thread does not go away / get deleted so that the other team my reference it. Thanks. Simon

0 Kudos
Reply