Mobile and Desktop Processors
Intel® Core™ processors, Intel Atom® processors, tools, and utilities
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
16836 Discussions

265K mystery throttling resolved by AVX offset, doesn't appear in MSR_CORE_PERF_LIMIT_REASONS

Yump
Beginner
4,151 Views

I'm going to lay out the facts as I know them. Please help me figure out what's going on here.

  1. CPU is a Core Ultra 7 265K installed in an Asrock Z890 Pro-A Wifi, BIOS 2.18, with stock settings (PL1=PL2=250W; LLC Level 6; ACLL=DCLL=0.76; current limit 347 A). Running Fedora Linux, kernel 6.12.5.
  2. In most code, all-core boost frequency is 5200 MHz on P-cores and 4600 MHz on E-cores.
  3. As measured by turbostat under Linux, `xmrig --bench=1M` does not sustain the 5200 MHz P-core frequency. Instead, P-cores average 5130 MHz, +- a little. This holds all the way down to the shortest sampling period turbostat supports, 1 ms.
  4. Checking the instantaneous core multiplier with `rdmsr -a 0x198 --bitfield 15:8 --decimal` suggests that the cores are running at 52 with occasional dips to 46 or 47.
  5. According to turbostat, pkg power is comfortably below PL1/PL2.
  6. At bootup, `MSR_CORE_PERF_LIMIT_REASONS`, 0x64f, is `10c20000h`, which, if I am decoding correctly, means that thermal throttling, VRM temperature throttling, VRM thermal design current throttling, and turbo-limit throttling have happened in the past. IDK why it comes up in that state. BIOS bug?
  7. Resetting `MSR_CORE_PERF_LIMIT_REASONS` to zero, and reading during/after the xmrig run, shows only turbo limit throttling in the log bits.
  8. With AVX2 offset = 4 in BIOS, all P-cores sustain 5200 MHz.
  9. With AVX2 offset = 3,  6 of the P-cores sustain 5200 MHz, and the 2 prime cores with 5500 MHz turbo limit throttle to 5130 MHz.
  10. AVX offsets of 2 and 1 result in all P-cores throttling to 5130.
  11. The default AVX2 offset is just shown as "Auto", but Intel XTU under Windows says it's 0.

What stands out to me is that the cores stop throttling exactly when the AVX offset would put them 1 bin below 5200 MHz. But what does it mean? Also,

  • Setting all core turbo limit of 5200 MHz doesn't prevent throttling (with AVX offset = 0)
  • Setting "specific per-core" ratio limits to 52x in BIOS doesn't prevent throttling (with AVX offset = 0).

Is there a way to avoid this throttling mechanism other than a large AVX offset?

Aside: when I say "most code" above, this includes prime95 1024k in-place FFT, as long as AC Load Line is raised or CEP is disabled. But MSR_CORE_PERF_LIMIT_REASONS doesn't seem to log CEP throttling either. Is there something I can look at to detect it?

Labels (1)
0 Kudos
13 Replies
RobbieR_Intel
Moderator
4,033 Views

Hello Yump,

 

Thank you for reaching out. It seems there might be an overheating concern. To assist you further, please update your BIOS to the latest version, 2.22.AS05 [Beta]. Once updated let me know if the issue persists.

 

Additionally, kindly answer the following questions:

  • What is the make and model of your cooling system?
  • Was your system working fine before?
  • Were there recent hardware or software changes that might be related to this issue?
  • What is the temperature when the system is and under load?
  • Have you check your cooling solution for proper installation? Have you reapplied new thermal paste?
  • Is your BIOS configured to default settings?


I look forward to your reply!

 

Best Regards,

 

Robbie R.

Intel Customer Support Technician


0 Kudos
Yump
Beginner
3,985 Views

Sorry for the delay. I have updated the BIOS. I have also found an easier reproducer, which only requires stress-ng, which is packaged in every major linux distribution.

 

taskset -c 0-3 stress-ng --cpu 4 --cpu-method stats

 

On my machine, this test runs at ~5130 MHz (60W, 57°C) on the occupied P-cores, while many other cpu-method choices sustain the expected 5200 MHz all-core turbo limit, even some higher-power ones like "hanoi" (77W, 66°C).


@RobbieR_Intel wrote:

It seems there might be an overheating concern.


Recall that there are two issues here, 1) the throttling happens, and 2) MSR_PERF_LIMIT_REASONS does report the cause. If the cause is overheating, issue 2 remains.

 

Additionally, your questions are anwered below:

  • What is the make and model of your cooling system?

Thermalright Phantom Spirt 120

 

  • Was your system working fine before?
  • Were there recent hardware or software changes that might be related to this issue?

This is a newly-assembled machine, running freshly-installed Fedora Linux. To the best of my knowledge, this issue has been present as long as I've had the tooling to detect it.

 

  • What is the temperature when the system is and under load?

Running the stress-ng reproducer on 4 P-cores, after ~5 minutes the temperature as reported by turbostat is 57°C.

 

  • Have you check your cooling solution for proper installation? Have you reapplied new thermal paste?

I believe that it is, based on the evidence that the "fft" stress-ng cpu method sustains ~245 W at ~93°C. I have not re-mounted, and would prefer not to because there isn't enough of the factory-supplied paste left in the tube for another attempt.

 

 

  • Is your BIOS configured to default settings?

Yes, I have reproduced the problem using BIOS 2.22.AS05 with the default settings.

0 Kudos
RobbieR_Intel
Moderator
3,999 Views

Hello Yump,


I wanted to check if you had the chance to review the questions I posted. Please let me know at your earliest convenience so that we can determine the best course of action to resolve this matter. 


Best regards,


Robbie R.

Intel Customer Support Technician


0 Kudos
RobbieR_Intel
Moderator
3,922 Views

Hello Yump,


Thank you for the additional information. With that being said, kindly give me more time as I further investigate the issue. I will get back to you once I have come up with a resolution.


I sincerely appreciate your patience and understanding.


Best Regards.


Robbie R.

Intel Customer Support Technician


0 Kudos
RobbieR_Intel
Moderator
3,825 Views

Hello Yump,

 

Thank you for patiently waiting. Upon further investigation, throttling is a warning and protection mechanism to protect the system. Throttling is simply an indication and does not signify an error/problem with your system.

 

Another way to reduce throttling is increasing IccMax in XTU or BIOS, but this is overclocking the CPU. However, please be advised that overclocking may damage and may void your Warranty's CPU.

 

If you have further questions, kindly let me know

 

Best Regards,

 

Robbie R.

Intel Customer Support Technician


0 Kudos
Yump
Beginner
3,800 Views

I know what throttling is and what purpose it is designed to serve. As this term is apparently loaded, I will instead refer to "Clock Frequency Reduction". The problem is that the cause of this Clock Frequency Reduction is not logged, and the CFR happens when the known operating parameters (temperature, power) are well inside the regime where there should be no protective CFR.

I have mostly excluded IccMax as the cause, because:

  1. Setting IccMax way up to 512 A (matches the "ASRock Extreme" power delivery preset) does not prevent the CFR.
  2. By setting IccMax way down to 180 A, I can intentionally cause current limit protective CFR and see it logged in bit 24 "Other" of MSR_CORE_PERF_LIMIT_REASONS.  With the default value of IccMax (347 A), bit 24 is never set.

Again, there is no log of the cause of the clock frequency reduction, either in 0x1a4 MSR_CORE_PERF_LIMIT_REASONS or in 0x19c IA32_THERM_STATUS.

I too spent a week on further investigation, and have discovered the following:

  1. The clock frequency reduction happens even with turbo disabled by `echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo`.  In this condition, the P-cores average 3834 MHz in problematic workloads, and 3900 MHz (the specified base frequency) in non-problematic workloads.
  2. If the specfic-per-core frequency limits (not the 2c turbo limit, the individual core limits), are all set to 5200 MHz, bypassing active-core-dependent turbo and ITMT3.0, an AVX offset of only 1 suffices to prevent the throttling. That is, when the AVX frequency is 1 bin below 5200 MHz, the cores can run at 5200 MHz in problematic workloads.
  3. In my original workload (but not the stress-ng reproducer), there is no CFR if either bits 3 or 4 of MSR_PREFETCH_CONTROL (0x1a4) are set. Bit 4 (DCU next page prefetch disable) gives a very tiny performance increase; bit 3 (DCU IP prefetch disable) does not. Presumably the IP prefetcher has a perf benefit that is equal to its clock frequency cost.

Hypothesis: the clock frequency reduction happens when the core "expects" an AVX2 workload in the near future, and downclocks while a voltage transition is in progress. Is this theory plasuible to someone with actual knowledge of the PMU implementation? Is there any way to confirm or disconfirm it?

Please understand that a reply that does not contain the string "MSR_CORE_PERF_LIMIT_REASONS" will be considered non-responsive.

0 Kudos
RobbieR_Intel
Moderator
3,767 Views

Hello Yump,

 

Thank you for your response and for additional information as you have also done an investigation of the issue being encountered. I would need more time to further investigate the issue, once an resolution is available, I will get back to you.

 

I sincerely appreciate your patience and understanding

 

Best Regards,

Robbie R.

Intel Customer Support Technician


0 Kudos
RobbieR_Intel
Moderator
3,656 Views

Hello Yump,

 

Thank you for sharing your detailed feedback on the AVX implementation and clock frequency reduction in your system. I truly appreciate the time and effort you've put into investigating this.

 

If you have any further concerns or questions, please let me know. I look forward to your response!

 

Best Regards,

 

Robbie R.

Intel Customer Support Technician


0 Kudos
Yump
Beginner
3,650 Views

Well, I have no further information on this P-core frequency anomaly, but in case what I'm writing here is making it to an engineers eyeballs...

I also found an E-core IPC anomaly.

In the same workload (xmrig), Something randomly causes some E-cores to lose ~30% of their IPC, but it seems to be some kind of transient metastable state that gets disrupted by context switch, because I can restore (most of?) the missing performance by having a background thread force-migrate itself round-robin between all the E-cores.

Oh, and the recent 2.22.AS05 BIOS version also seems to forcibly set and lock MSR_PREFETCH_CONTROL bit 7, and that isn't mentioned in the field update overview. Was it an intended change?

0 Kudos
RobbieR_Intel
Moderator
3,623 Views

Hello Yump,

 

Thank you for providing additional information. I will further review this and I will get back to you once I have accurate information with what you've provided.

 

Best Regards,

 

Robbie R.

Intel Customer Support Technician


0 Kudos
RobbieR_Intel
Moderator
3,552 Views

Hello Yump,

 

Thank you for the additional feedback that you have shared. Due to system limitation, we can only record 1 issue per case, since the AVX offset has already been explained, this thread will no longer be monitored, however, with your feedback with the e-core behavior on your system, I will now send you an Email so that I may contact you privately.

 

I sincerely appreciate what you've shared. With that being said, kindly monitor your inbox.

 

Best Regards,

 

Robbie R.

Intel Customer Support Technician


0 Kudos
Yump
Beginner
3,524 Views

As of writing this, I haven't received any additional emails other than the automated notifications about posts in this thread. I did check the spam folder.  Just in case, I have added a gmail whitelist filter for the @intel.com domain. Please send again.

0 Kudos
RobbieR_Intel
Moderator
3,430 Views

Hello Yump,


Thank you for your reply. Kindly re-check your inbox as I have sent a new email. Thank you!


Best Regards,


Robbie R.

Intel Customer Support Technician


0 Kudos
Reply