Processors
Intel® Processors, Tools, and Utilities
16321 Discussions

Intel Core Ultra 9 285k - CPU-heavy workload runs for 2 minutes then starves application permanently

prent_rodgers
Beginner
4,694 Views

I have the 285k using an ASRock Intel Core Ultra Z890 LGA1851 RL-ILM Mini ITX Motherboard. I run Fedora CoreOS on this node of a Kubernetes cluster. I experience the same effect running on the node without using Kubernetes.

Fedora release 41 (Forty One)
NAME="Fedora Linux"
VERSION="41.20250105.3.0 (CoreOS)"
RELEASE_TYPE=stable
ID=fedora
VERSION_ID=41
VERSION_CODENAME=""
PLATFORM_ID="platform:f41"
PRETTY_NAME="Fedora CoreOS 41.20250105.3.0"

ROM version Z890I-Nova-WiFi_2.23.AS03 dated 12/26/24

I have some intensive python based workload that spawns multiple threads up to the maximum number of cores available. It's doing some intensive Simulated Annealing workload that looks for the optimal tuning for microtonal musical chords to low number ratios. I have to run the algorithm multiple times for every chord in a Bach chorale, searching for the optimum tuning. It takes about 45 seconds per chorale to come up with the optimum tuning for every chord. By the time it finished one chorale, it moves to the next.

Unfortunately, once it has run at 5.7 gHz for two minuntes, it suddenly deprives my python program of any CPU resources at all. I run btop on linux, and it shows the temperature rising to around 80c, power to 250 Watts for the first 120 seconds, then drops to 40 watts, and 46c, but 0% CPU is allocated to my python program. I can leave the program running for half an hour, and it makes no progress. I have to delete the kubernetes pod and restart it in order to regain any performance. It repeats the same high CPU 5.7 gHz for 120 seconds, then down to zero cpu.

If I switch the turbo off, it runs fine at 3.7 gHz. 

      echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo

But I'd like to get the extra speed. I read in the intel Turbo P-states Support guide for kernel parameters, and it says "More precisely, there is no guarantee that any CPUs will be able to stay in any of those states indefinitely, because the power distribution within the processor package may change over time or the thermal envelope it was designed for might be exceeded if a turbo P-state was used for too long."

So is this working as designed and I am limited to 3.7 gHz for long running processor intensive workloads?

I posted two images, one with all 24 cores running at 100% for about 60 seconds, then at two minutes into the run, it drops to 0%. It never gives any resource to my python application after that. I could live with reducing it to 3.7 gHz, but zero is hard to live with. 

(Virus scan in progress ...)
(Virus scan in progress ...)
0 Kudos
1 Solution
DeancR_Intel
Moderator
4,503 Views

Hi prent_rodgers,


Thank you for posting in the Community!


I suggest checking the following:


Cooling Solutions: Ensure that your cooling solution is adequate for the high-power consumption and heat generation of your CPU. You might need to upgrade your cooling system to maintain higher performance levels.


Power Limits: Check if there are any power limits set in your BIOS or operating system that might be causing the CPU to throttle. Adjusting these settings might help maintain higher performance.


BIOS Updates: Ensure your motherboard BIOS is up to date, as updates can include improvements for thermal and power management. Since this parameter is a generic term, I highly recommend that you seek assistance with your motherboard support to know where you can find those BIOS settings. ASRock > Support

 

Best regards,

 

Dean R.

Intel Customer Support Technician










View solution in original post

0 Kudos
18 Replies
pressed_for_time
Valued Contributor II
4,640 Views

Does disabling thermald.service make any difference?

0 Kudos
prent_rodgers
Beginner
4,601 Views

There is no thermald service running. 

0 Kudos
prent_rodgers
Beginner
4,568 Views

I should also mention that I can run synthetic stress workloads for a long time with no slowdown. To stress the CPU, here is another method. It never throttles this, allowing 5.7 gHz and 270 Watts for ten minutes, reaching 100c. When I cap at non-turbo, it only uses 134 Watts and the temperatures are in the 50's.
taskset -c 0-24 stress-ng --cpu 24 --cpu-method stats

I can also run a kubernetes based stress app for a similar amount of time vish/stress as a container image with
args:
- -cpus
- "24"

0 Kudos
prent_rodgers
Beginner
4,568 Views
Perhaps it has something to do with a combination of CPu and memory accesses driving too much power at the motherboard level.
0 Kudos
DeancR_Intel
Moderator
4,504 Views

Hi prent_rodgers,


Thank you for posting in the Community!


I suggest checking the following:


Cooling Solutions: Ensure that your cooling solution is adequate for the high-power consumption and heat generation of your CPU. You might need to upgrade your cooling system to maintain higher performance levels.


Power Limits: Check if there are any power limits set in your BIOS or operating system that might be causing the CPU to throttle. Adjusting these settings might help maintain higher performance.


BIOS Updates: Ensure your motherboard BIOS is up to date, as updates can include improvements for thermal and power management. Since this parameter is a generic term, I highly recommend that you seek assistance with your motherboard support to know where you can find those BIOS settings. ASRock > Support

 

Best regards,

 

Dean R.

Intel Customer Support Technician










0 Kudos
prent_rodgers
Beginner
3,978 Views

It looks like my power supply has gone casters up. After about three weeks, it could not power the CPU to POST. The fans ran, but nothing else. Now it won't power up at all. I've asked for a replacement from the manufacturer. This could be the root cause of the problem but it will take a few weeks to determine while I wait for the replacement.

0 Kudos
DeancR_Intel
Moderator
4,361 Views

Hi prent_rodgers,


I've noticed that you marked my last reply as your solution. Did the issue fixed on you end? Let me know if you have other questions.

 

Best regards,

 

Dean R.

Intel Customer Support Technician


0 Kudos
prent_rodgers
Beginner
4,319 Views

I took the advice to improve my cooling, but made the fatal mistake of removing the CPU to try to clean the exess thermal paste, and ended up getting paste all over the pins. In trying to clean them I bent several. Now it won't boot. I've ordered a replacement board and hope to try again in a week. I'm assuming that cooling was at fault for now.

0 Kudos
DeancR_Intel
Moderator
4,271 Views

Hi prent_rodgers,


Apologies for the inconvenience that might have caused you. I appreciate your efforts fixing the issue you have encountered and I'm looking forward for your update.

 

Best regards,

 

Dean R.

Intel Customer Support Technician


0 Kudos
DeancR_Intel
Moderator
4,022 Views

Hi prent_rodgers,


I'm following up to find out if you experience any issues. I'm not able to get any response from you regarding the needed information.

 

Best regards,

 

Dean R.

Intel Customer Support Technician


0 Kudos
DeancR_Intel
Moderator
3,799 Views

Hi prent_rodgers,


I will wait as well to your update and your response is highly appreciated.

 

Best regards,

 

Dean R.

Intel Customer Support Technician


0 Kudos
DeancR_Intel
Moderator
3,441 Views

Hi prent_rodgers,


I'm following up to find out if you experience any issues. I'm not able to get any response from you regarding the needed information.

 

Best regards,

 

Dean R.

Intel Customer Support Technician


0 Kudos
prent_rodgers
Beginner
3,080 Views

I finally was able to get the PC working again after the power supply failure, and a few other problems. The 285k is exceptionally fast, but only for about 2-3 minutes. Then is slows down to 800 hHz, and my python program is completely starved of CPU resources. This doesn't happen when I run stress-ng.

For example, this runs fine for five minutes, consuming around 330 watts, temperatur peaking at 105c, running at 5.4 gHz across all 24 processors:

taskset -c 0-24 stress-ng --cpu 20 --iomix 4 --vm 12 --vm-bytes 1024M --fork 4 --timeout 300s

All good. But there is something about my python workload, that just flat out stops after three minutes of hard work. I'm at a loss to understand what is happening. 

I replaced the motherboard, power supply, and cooling system. But that improved the situation. Previously it throttled down to 400 mHz after only 60 seconds. I am able to get about 2:30 before it slows to 800 mHz. Attached are two charts that show everything humming along at 5.4 gHz, then during the ninth run of the complex Python Simulataed Annealing, it slows to 800 mHz. It was supposed to run through 11 different chorales, and I only got 8 completed. Bach wrote several hundred, so I have a ways to go yet. 

Here is the current state of the machine:

System-manufacturer :  ASRock
System-product-name :  Z890I Nova WiFi
Bios-release-date :  12/25/2024
Bios-version :  2.23.AS03

Operating system:
     cat /etc/*release*
Fedora release 41 (Forty One)
NAME="Fedora Linux"
VERSION="41.20250117.3.0 (CoreOS)"
RELEASE_TYPE=stable
 

Kubernetes version 1.30.9

Here is the running container:

k exec -it tonicnet-d-547d6497d4-44rk9 -- bash

  bash-5.2# cat /etc/*release*

Fedora release 40 (Forty)
NAME="Fedora Linux"
VERSION="40 (Forty)"
ID=fedora
VERSION_ID=40

I have made no changes using the ASRock bios menus. I'm willing to try modifying anything that won't damage the CPU.

 

(Virus scan in progress ...)
(Virus scan in progress ...)
0 Kudos
PC1997
New Contributor I
2,999 Views

I have my own ideas... but I don't want to waste more of your time. So I fed your problem into the LLMs... The only thing I will say, is ignore increasing the power limits. You are already pushing WAY too much power when stress testing your CPU at 330 W. And I agree with n_scott_pearson.

I basically never use ChatBots, as I feel they never solve any real world problems, at least not for me. But here it goes...

"It seems like your Intel 285k processor is throttling its performance after a few minutes of intense workload. The symptoms you're describing (temperature spike, power drop, and CPU resources being deprioritized) strongly suggest that the CPU is likely hitting thermal or power limits, and when these limits are exceeded, it’s entering a thermal throttle or power-saving state. This behavior typically occurs when the CPU temperature reaches a certain threshold, and the system begins to scale back performance to prevent overheating or damage.

Here are a few things to consider and potential solutions:

1. Thermal Throttling

CPUs have built-in thermal protection mechanisms. Once the temperature reaches a specific threshold (usually 80°C to 100°C), they throttle the clock speed to reduce heat generation.

The drastic drop in power (from 250 watts to 40 watts) and the CPU going idle suggests that thermal throttling is happening.


Solution:

Improve Cooling: Ensure your CPU cooling solution (e.g., heatsink, fan, liquid cooling) is adequate for the workload. Consider upgrading your thermal solution if necessary.

Thermal Paste: Check if the thermal paste is applied properly or has dried out over time. Reapplying it could help improve heat dissipation.

Case Ventilation: Ensure your case has good airflow to maintain a low ambient temperature around the processor.


2. Power Delivery (PL1/PL2) Limits

Intel CPUs have power limits (PL1 for sustained power, PL2 for short bursts) to prevent excessive power draw and thermal issues. If the workload is consistently demanding, the CPU may be hitting these power limits.


Solution:

Check BIOS/UEFI Settings: Some BIOS/UEFI setups allow you to adjust the power limits (PL1 and PL2). You could try increasing the power limits to allow the CPU to run at higher power for longer periods, but be careful, as this can lead to increased temperatures.

Increase Power Limits: Check your BIOS settings for configurable power limits (e.g., Turbo Mode settings or PL1/PL2).


3. Frequency Scaling and CPU C-States

The CPU might be entering deeper idle states (C-states) when the workload is reduced or after a thermal throttle, leading to the drop in CPU utilization and performance.


Solution:

Disable C-States: You can try disabling deep C-states in the BIOS, which may prevent the CPU from entering low-power states.

Check CPU Governor: Fedora may be using a power-saving CPU governor. You can try changing the CPU governor to "performance" mode to prevent the CPU from scaling down too aggressively.

cpupower frequency-set --governor performance


4. Kubernetes or Resource Limits

Since the issue persists even when not running in Kubernetes, it’s less likely to be a Kubernetes-specific issue. However, Kubernetes resource limits (if set) might cause the container to be throttled or paused when high resource consumption is detected.


Solution:

Resource Limits: Ensure that your Kubernetes pod has adequate CPU and memory resource requests/limits. Set these values high enough to allow your workload to run without being restricted by the scheduler.


5. Operating System and Power Management

Fedora CoreOS could be handling power management in a way that is impacting your CPU performance, especially when the system is under load.


Solution:

Disable Intel P-State: You can try disabling Intel's P-State driver, which controls dynamic frequency scaling on Intel processors, and instead use the acpi-cpufreq driver. Add this to your kernel boot parameters:

intel_pstate=disable

Then reboot and check if the CPU scaling behavior improves.


6. System Monitoring and Logging

To get more detailed insights, you can monitor system logs for any messages related to thermal events, power management, or CPU throttling.

journalctl -k | grep -i throttle

If there's any indication of CPU throttling due to power or thermal limits, it will show up in these logs.


7. Python Threading and Parallelism

You mentioned the workload uses multiple threads up to the maximum core count. While Python threading might utilize multiple cores, the Global Interpreter Lock (GIL) could be limiting performance for CPU-bound tasks. Consider using multiprocessing instead of threading to fully utilize all CPU cores.


Solution:

Multiprocessing: Instead of relying on threads, try using the multiprocessing module, which allows each worker process to run independently on separate cores.

Joblib or Dask: These tools can also help distribute your workload across multiple processes effectively.


8. Check Kernel and Firmware Updates

Ensure that your system’s firmware (BIOS/UEFI) and Linux kernel are up to date. Newer kernel versions might have improvements in CPU power management and thermal handling.



---

Conclusion:

Given your workload, it seems like a combination of thermal throttling and power limits is causing the CPU to stop allocating resources to your program. Start by improving the cooling solution, adjusting power limits in the BIOS, and ensuring the system is not entering low-power states too aggressively. If that doesn't resolve the issue, you can explore changes to your Python threading model and OS-level power management settings."

0 Kudos
n_scott_pearson
Super User
3,019 Views

This sounds like you are being throttled, either because of temperatures, power or some condition on the motherboard. You can use an app like ThrottleStop to see what is causing the throttling.

Hope this helps,

...S

0 Kudos
DeancR_Intel
Moderator
2,813 Views

Hi prent_rodgers,


I'm glad that the previous issue was solved. Just to note that the motherboard does have an impact on the ability to overclock. Some are better quality and more capable than others. Furthermore, the way the motherboard is set up also impacts the overclocking ability of a particular system (such as liquid cooler vs fan).


  1. Make sure that the thermal solution being used is compatible and correct for the specific CPU.
    1. Refer to How to Choose Thermal Solutions for Intel® Core™ Boxed Desktop Processor.
    2. Refer to Information about Thermal Solutions
  2. Verify proper installation of the processor thermal solution. Refer to the Cooler Installation section in Intel® Desktop Boxed Processors Support Videos and Instructions
  3. Make sure to apply the right amount of thermal interface material (TIM).  
  4. Check system fan operation.
  5. Check air ventilation.


See this link for reference:

What Is Throttling and How Can It Be Resolved?

 

Best Regards,

 

Dean R.

Intel Customer Support Technician



0 Kudos
DeancR_Intel
Moderator
2,105 Views

Hi prent_rodgers,

 

I just wanted to follow up on my previous message about your inquiry. Have you had a chance to review it? If you have any questions or need more information, please feel free to let me know.

 

Best Regards,

 

Dean R.

Intel Customer Support Technician

 

0 Kudos
DeancR_Intel
Moderator
1,787 Views

Hi prent_rodgers,

 

I understand that you might be busy, so since I haven't heard back from you, I'll be closing this thread, and it will no longer be monitored. If you still need help with the troubleshooting, please feel free to reach out to Intel Customer Support or start a new thread.

 

Best regards, 

 

Dean R. 

Intel Customer Support Technician 

 

 


0 Kudos
Reply