Processors
Intel® Processors, Tools, and Utilities
16111 Discussions

Intel Core Ultra 9 285k - CPU-heavy workload runs for 2 minutes then starves application permanently

prent_rodgers
Beginner
870 Views

I have the 285k using an ASRock Intel Core Ultra Z890 LGA1851 RL-ILM Mini ITX Motherboard. I run Fedora CoreOS on this node of a Kubernetes cluster. I experience the same effect running on the node without using Kubernetes.

Fedora release 41 (Forty One)
NAME="Fedora Linux"
VERSION="41.20250105.3.0 (CoreOS)"
RELEASE_TYPE=stable
ID=fedora
VERSION_ID=41
VERSION_CODENAME=""
PLATFORM_ID="platform:f41"
PRETTY_NAME="Fedora CoreOS 41.20250105.3.0"

ROM version Z890I-Nova-WiFi_2.23.AS03 dated 12/26/24

I have some intensive python based workload that spawns multiple threads up to the maximum number of cores available. It's doing some intensive Simulated Annealing workload that looks for the optimal tuning for microtonal musical chords to low number ratios. I have to run the algorithm multiple times for every chord in a Bach chorale, searching for the optimum tuning. It takes about 45 seconds per chorale to come up with the optimum tuning for every chord. By the time it finished one chorale, it moves to the next.

Unfortunately, once it has run at 5.7 gHz for two minuntes, it suddenly deprives my python program of any CPU resources at all. I run btop on linux, and it shows the temperature rising to around 80c, power to 250 Watts for the first 120 seconds, then drops to 40 watts, and 46c, but 0% CPU is allocated to my python program. I can leave the program running for half an hour, and it makes no progress. I have to delete the kubernetes pod and restart it in order to regain any performance. It repeats the same high CPU 5.7 gHz for 120 seconds, then down to zero cpu.

If I switch the turbo off, it runs fine at 3.7 gHz. 

      echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo

But I'd like to get the extra speed. I read in the intel Turbo P-states Support guide for kernel parameters, and it says "More precisely, there is no guarantee that any CPUs will be able to stay in any of those states indefinitely, because the power distribution within the processor package may change over time or the thermal envelope it was designed for might be exceeded if a turbo P-state was used for too long."

So is this working as designed and I am limited to 3.7 gHz for long running processor intensive workloads?

I posted two images, one with all 24 cores running at 100% for about 60 seconds, then at two minutes into the run, it drops to 0%. It never gives any resource to my python application after that. I could live with reducing it to 3.7 gHz, but zero is hard to live with. 

0 Kudos
1 Solution
DeancR_Intel
Moderator
679 Views

Hi prent_rodgers,


Thank you for posting in the Community!


I suggest checking the following:


Cooling Solutions: Ensure that your cooling solution is adequate for the high-power consumption and heat generation of your CPU. You might need to upgrade your cooling system to maintain higher performance levels.


Power Limits: Check if there are any power limits set in your BIOS or operating system that might be causing the CPU to throttle. Adjusting these settings might help maintain higher performance.


BIOS Updates: Ensure your motherboard BIOS is up to date, as updates can include improvements for thermal and power management. Since this parameter is a generic term, I highly recommend that you seek assistance with your motherboard support to know where you can find those BIOS settings. ASRock > Support

 

Best regards,

 

Dean R.

Intel Customer Support Technician










View solution in original post

0 Kudos
10 Replies
pressed_for_time
Valued Contributor II
816 Views

Does disabling thermald.service make any difference?

0 Kudos
prent_rodgers
Beginner
777 Views

There is no thermald service running. 

0 Kudos
prent_rodgers
Beginner
745 Views

I should also mention that I can run synthetic stress workloads for a long time with no slowdown. To stress the CPU, here is another method. It never throttles this, allowing 5.7 gHz and 270 Watts for ten minutes, reaching 100c. When I cap at non-turbo, it only uses 134 Watts and the temperatures are in the 50's.
taskset -c 0-24 stress-ng --cpu 24 --cpu-method stats

I can also run a kubernetes based stress app for a similar amount of time vish/stress as a container image with
args:
- -cpus
- "24"

0 Kudos
prent_rodgers
Beginner
745 Views
Perhaps it has something to do with a combination of CPu and memory accesses driving too much power at the motherboard level.
0 Kudos
DeancR_Intel
Moderator
680 Views

Hi prent_rodgers,


Thank you for posting in the Community!


I suggest checking the following:


Cooling Solutions: Ensure that your cooling solution is adequate for the high-power consumption and heat generation of your CPU. You might need to upgrade your cooling system to maintain higher performance levels.


Power Limits: Check if there are any power limits set in your BIOS or operating system that might be causing the CPU to throttle. Adjusting these settings might help maintain higher performance.


BIOS Updates: Ensure your motherboard BIOS is up to date, as updates can include improvements for thermal and power management. Since this parameter is a generic term, I highly recommend that you seek assistance with your motherboard support to know where you can find those BIOS settings. ASRock > Support

 

Best regards,

 

Dean R.

Intel Customer Support Technician










0 Kudos
prent_rodgers
Beginner
155 Views

It looks like my power supply has gone casters up. After about three weeks, it could not power the CPU to POST. The fans ran, but nothing else. Now it won't power up at all. I've asked for a replacement from the manufacturer. This could be the root cause of the problem but it will take a few weeks to determine while I wait for the replacement.

0 Kudos
DeancR_Intel
Moderator
538 Views

Hi prent_rodgers,


I've noticed that you marked my last reply as your solution. Did the issue fixed on you end? Let me know if you have other questions.

 

Best regards,

 

Dean R.

Intel Customer Support Technician


0 Kudos
prent_rodgers
Beginner
496 Views

I took the advice to improve my cooling, but made the fatal mistake of removing the CPU to try to clean the exess thermal paste, and ended up getting paste all over the pins. In trying to clean them I bent several. Now it won't boot. I've ordered a replacement board and hope to try again in a week. I'm assuming that cooling was at fault for now.

0 Kudos
DeancR_Intel
Moderator
448 Views

Hi prent_rodgers,


Apologies for the inconvenience that might have caused you. I appreciate your efforts fixing the issue you have encountered and I'm looking forward for your update.

 

Best regards,

 

Dean R.

Intel Customer Support Technician


0 Kudos
DeancR_Intel
Moderator
200 Views

Hi prent_rodgers,


I'm following up to find out if you experience any issues. I'm not able to get any response from you regarding the needed information.

 

Best regards,

 

Dean R.

Intel Customer Support Technician


0 Kudos
Reply