- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
taskset -c 0-15 emerge -e @World
This recompiles the whole system using just the P-cores, it takes half a day to a day to complete the recompilation of ~1400 packages.
I have rasdeamon running to log hardware errors.
With the performance profile (Iccmax=307a, pl1=253w, pl2=253w) the CPU is unstable with anything less than VRM load line level 6 (Asus bios)
Interestingly it is also stable at LL6 in the extreme profile (Iccmax=400, pl1=320, pl2= 320).
When using a lower load line (tested from the MB default of three up to 5) RAS shows the errors are consistently on CPU 0x8, and are either instruction fetch failures from the level 0 instruction cache, or TLB errors.
I previously had a 13900ks which ran fine with unlocked power limits (Iccmax=511.75a, pl1=4095, pl2=4095)
I have a pretty good water cooling setup (6x120 shared between CPU and GPU, but GPU is idle in all these tests). Water temperature is 31-32°C once warmed up for the duration of the test, room temp about 25°C.
- Am I right in assuming that CPU=0x8 on all these errors means that P-core 8 might be "bad"?
- is needing load line level 6 to get the CPU stable usual and/or something to worry about?
Thanks for any help you can offer.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I also tried over-volting. This was interesting because I did get it stable with 6.2GHz enabled at +250mV offset. However it was overall the same speed or slower as the single p-core being used was hitting 100°C and thermally throttling - but not crashing.
So to get the chip stable during it brief spike to 6.2GHz, we end up thermally throttling after a few seconds to 5.8GHz, resulting in overall lower performance than just limiting clock speed to 5.9GHz.
So here is where I am at:
- limiting frequency to 5.9GHz (@ Asus LLC5) gives best all core performance
- disabling hyper-threading (@ Asus LLC3) gives best single threaded performance.
Overvolting for a higher clock speed, or under-volting to reduce temperature do not result in better performance than the above.
The only remaining possibility I can think of to improve on the above would be to improve the cooling. As my water loop is only at 32°C it's not limited by the fans nor the radiators, which means de-lidding the CPU is the only way to improve things.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
undervolting would do the opposite of stabilizing the system. I'm surprised you were recommended that since it seems clear that you want to make the system stable. that is a substantial over-volt that I would expect to thermal throttle quite quickly on those cores.
you're not touching AC_LL or DC_LL in the internal power management, right? just letting SVID behavior set it for you? which setting do you have that on? Depending on the setting, ASUS motherboards undervolt to some extent.
I don't think you're going to be able to squeeze much more performance out of those cores unfortunately. Have you tried disabling HT for just those two cores and leaving it on for all of the others? That should allow you to keep most of the performance while also somewhat taming temps and power consumption.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have asked Intel how they recommend to undervolt, as they suggested it.
My motherboard (W680) does not appear to undervolt. Auto settings result in AC/DC LL of 1.1/1.1 on LLC 3.
What I tried was reducing the AC and DC LL (as Intel say they should be set to the same value), to say 0.2/0.2 with a standard LLC like 3 or 4.
I was not aware HT could be disabled on individual cores, it's not something I can do with this BIOS.
I think there is an easy solution for Intel, and that is to limit p-cores with both hyper-threads busy to 5.8GHz and allow cores with only one hyper-thread active to boost up to 5.9/6.2 they would then have a chip that matched advertised multi-core and single-thread performance, and would be stable without any specific power limits.
I still think the real reason for this problem is that hyper-threading creates a hot-spot somewhere in the address arithmetic part of the core, and this was missed in the design of the chip. Had a thermal sensor been placed there the chip could throttle back the core ratio to remain stable automatically, or perhaps the transistors needed to be bigger for higher current - not sure that would solve the heat problem. Ultimately an extra pipeline stage might be needed, and this would be a problem, because it would slow down when only one hyper-thread is in use too. I wonder if this has something to do with why intel are getting rid hyper-threading in 15th gen?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
DC_LL should not be changed, as that should be tied to the LLC value that you're using. Best to leave that on auto as the motherboard will synchronize the value with your chosen LLC. You would reduce AC_LL in that case to properly undervolt. Having AC_LL == DC_LL is going to result in your VRM delivering quite a bit of voltage to your CPU.
I'm not 100% sure if per-core HT is innate to the CPU or the motherboard; it would make more sense to be a characteristic of the CPU imo so perhaps you should check your BIOS version and/or get in touch with BIOS developers for your board.
I'm not sure how technically challenging it would be to implement the solutions you've outlined, though they sound like reasonable ways to address the problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I thought the same about the DC_LL, but Intel's latest guidance on 13th/14th gen stability (may 8th) is that DC_LL == AC_LL. AFAIK only AC_LL affects the voltage delivered. DC_LL affects the power calculation.
Perhaps there is some benefit in under-volting, but it would have to be combined with reducing the max core frequency. As the CPU thermally limits on all core loads this would improve all-core performance, as measured by work done, rather than improving the max frequency. It will be sacrificing more single threaded performance for all-core performance though.
An interesting thought is that if under voting works, why isn't this included in the VID table? For example if we set the frequency limit at 5.9Ghz, and apply the necessary voltage for this to be stable (LLC5) the core throttles at 5.6GHz on sustained all p-core loads. If we limit at 5.8GHz with the voltage required for this to be stable (LLC3) the core throttles at 5.7GHz. When the core hits the thermal limit, why doesn't the VID reduce the frequency and voltage? Then under-volting would not provide any performance advantage?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'll have to turn on virtual machine platform and do your cpu affinity test with gcc emerge later -- I started testing having HT on again but with just the two "preferred" cores limited to 6.1, which has been promising so far. Replied in your other thread about the undervolting recommendation and results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I did some tests with reducing the voltage. The results are that limiting the CPU to 5.8GHz is optimal for all core performance. The increased voltage necessary for stability at 5.9GHz results in more thermal throttling, and so the overall performance is worse. The decreased frequency at 5.7GHz is already slow enough that the decrease in voltage does not result in any better performance, it is slower than 5.8GHz.
I then tried to optimise 5.8GHz for which I was using LLC3. Unfortunately LLC2 was not stable, so I am already using the lowest stable load-line calibration. I then bisected the AC/DC load line value, whilst it is stable at lower values like 0.5/0.5 it is running at half the speed, so that's no good. The lowest stable (full speed) value for AC/DC was 0.83/0.83, this was slightly faster than 1.0/1.0, but both of these were slower than leaving the AC/DC on the auto motherboard setting, so I think this is all just noise, and there is no benefit to this degree of slight undervolting. Seems sensible to leave on Auto/Auto.
So for my chip, the best multi-core performance turns out to be limiting p-cores to 5.8GHz @ LLC3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Really glad I found your series of posts here – this is very similar to what I'm seeing with my i9 14900K. I don't have any interest in gaming or overclocking so this is a stock build without any attempt to push the processor beyond what the bios is doing by default. I'm running Windows 11 and use the PC exclusively for C++ software development. For the first few months the processor was stable, but i'm now getting multiple random clang compiler crashes that go away after retrying.
I'm now considering buying a new system. My work cannot withstand the downtime of taking out the CPU and doing an RMA exchange.
In my case, compiling a codebase like chromium from scratch with a pristine known-good git checkout has a 100% chance of a clang ICE. The stack traces from these crashes never make any sense either – that's made it hard to narrow down. The clang crash report might show an invalid syntax encountered while parsing some C++ AST but succeed when retrying. It's stochastic in nature – never the same error twice, or with the same file. I also see crashes in Python scripts that run as part of the build as well. I tried compiling the same project in Ubuntu and saw the same results.
I've upgraded the BIOS (I'm using a ASUS Z790 ProArt) enabled the "Intel Baseline Profile", tried setting PL1 to 125 and PL2 253 – the stock limits and I'm still seeing it.
Your theory about a potential bug in hyper-threading is the most plausible explanation I've encountered for this stability issue. Normal stress tests like Prime95, MemTest86, and a collection of tests on my NVME drive all come back clean. I suspect the all-out workload of many competing processes scheduled by a parallel build system such as Ninja is flushing out some issues that the standard stress tests do not.
I just configured to limit the frequency to 5.6Ghz – I just need something that works at this point. On the first few compiles things are looking good. I will update with any other findings.
I was trying to follow from your series – are you still running with Hyper Threading disabled or just the frequency limit?
- Tags:
- 14900k
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found that power limits were mostly irrelevant, and once I had the individual cores stable, I could increase the power limits without problem, the CPU will just thermally throttle anyway.
My final stable configs were x58 with LLC3 and extreme power profile which maximises multi-threaded performance. (I could get x59 stable with LLC5 but the extra heat resulted in slower compile times).
The other was to just disable hyper-threading, leave frequency unlimited, extreme power profile, LLC3. This maximised single threaded performance.
As I mostly use for software development, I am using the first config for optimal multi-threaded performance.
With these settings it seems completely stable. I would recommend testing each p-core individually though by running a multi-threaded compile and setting the thread affinity to each p-core. Because there are two vCPU in each p-core that's in pairs like 0 & 1, 2 & 3, 4 & 5 etc...
I have found that if you are right on the edge of stability it's also worth running pairs of p-cores, sets of 4 as well as all 8.
I have written a testing script that bootstraps GCC using, the following vCPU:
- single p-cores: 0-1, 2-3, 4-5, 6-7, 8-9, 10-11, 12-13, 14-15
- 2 p-cores: 0-3, 4-7, 8-11, 12-15
- 4 p-cores: 0-7, 8-15
- all p-cores: 0-15
As the problem is specific to hyper-threading, you don't need to test e-cores.
That set of tests takes about 8 hours to run, but if a config passes, it seems to be completely stable.
Have you updated the BIOS at all? I wonder if motherboard vendors have changed the settings (under-volting) to gain performance?
I am not sure I believe that there is silicon degradation going on, I guess it's possible, but I have had 2 14900ks that have both behaved like this from brand new, and I now suspect my 13900ks had the same problems, although happening much less often, but I just blamed the software. I haven't seen any thorough testing that passed on a new CPU, that has then failed on a CPU after months of use.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Huge kudos for this find.
Prior to finding your post, I was trying extremely conservative power limits and capping everything at the intel specifications, but I couldn't get 20% into any build that fully saturates all cores.
So one of the things I find curious: To try to get stability at any cost, I had been running PL1=125, PL2=253 and IccMax 307 (Intel specs for 14900k) for a few weeks. I don't think I ever saw the frequency on any cores hit anywhere near 6Ghz (I generally see ~5.0Ghz under sustained load), yet I was still experiencing a lot of instability. Perhaps when a new compile workload launches there's a momentary frequency spike that falls outside of my ~1s sensor sampling in XTU.
Do you have any ideas?
I've now settled on x57 and I haven't experienced any instability since. I've done 5-6 clean chromium builds (3-4 hours each) on this machine since then and 100% of them ran to completion without a hitch.
I did upgrade the bios within the last few weeks in an attempt to resolve this. ASUS released one that allows you to enable "Intel Baseline Profile" but that didn't help at all.
So if it's useful for others, my setup is basically:
- Reset bios to defaults, configure XMP
- P core ratio capped at x57, all core.
- E core ratio capped at x44, all core.
That's it! Everything runs great now.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unlike synthetic benchmarks which run multiple threads for the same length of time, starting all the cores together, and having them all finish together, compiling is running multiple single-threaded tasks of different lengths all at the same time. This means when compiling a large program cores are starting and stopping all the time, hence in the transient conditions when cores are starting and stopping there is a finite probability the less than 5 cores are active with at least one of them running two hyper-threads, which allows it to boost up to max frequency and crash.
If you were to use 'set affinity' to restrict the compile to each p-core, you could see the crash much quicker, and identify which p-core(s) are causing the problems. I found some of my p-cores were stable at 5.9GHz (14900ks) but the BIOS doesn't provide a way set frequency limits on individual p-cores, only by core usage.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, I also have a 14900k and am suffering from very similar issues (segmentation faults, CPU error on core 8/9), etc.. As your post was from a few months ago, have things been stable after enabling ONLY these settings and no more crashes?
So if it's useful for others, my setup is basically:
- Reset bios to defaults, configure XMP
- P core ratio capped at x57, all core.
- E core ratio capped at x44, all core.
That's it! Everything runs great now.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Keean,
I assure you that I am right here and I want to help you with your Intel® Core™ i9 processor 14900KS (36M Cache, up to 6.20 GHz). I have sent you an email regarding our email conversation and you may check your inbox for more details. I would like to apologize for the delay in my response and I hope you allow me the opportunity to make it up to you.
As for Norman_Trashboat, I understand that you are frustrated as the warranty claim for the 14900K has been denied. Seeing that you have delid the CPU, I want to set your expectation that physical damages can indeed void the warranty of the CPU. You may also visit this article for more information: Warranty Guide for Intel® Processors
as for ch94, I can see that you are actively engaging with Keean, in this thread. Feel free to do so as it is the best way to empower our community. However, you may also create a new thread so we can tend separately to your concern.
Ramyer M.
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As I have intel approved current limits set, my concern is that any electromigration could be caused by current density/heat in individual cores when hyper-threading. It's possible the only safe option is to disable hyper-threading, as no matter what power limits you set (above about 60W) it's possible for all that power to go to a single p-core.
If a single p-core with both hyper-threads fully loaded has a high enough current density at 100°C to cause electromigration, then there is no way to limit the current per core, so there are three options to mitigate this, reduce max core frequency, disable hyper-threading, or reduce max temperature. I tend to think thermal throttling is slow(er) to respond so not a good option.
It is interesting that Intel's 15th gen appears to both have a lower max clock speed (5.5Ghz is rumoured) and no hyper-threading.
For now I have reduced the frequency to 5.7GHz and it is stable again, however at this point it's barely faster than the 13900ks it replaced.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thought I would update now I have received a replacement 14900KS and updated to microcode 0x129
The processor passes all the tests on various combinations of p-cores without any errors at all. Whereas before I had to over-volt to get it stable at 5.9GHz, it can now boost to 6.2GHz with a 100mV under-volt, and I haven't had a single crash.
As it has been completely stable, and performing well, I was confident enough to delid and fit a Thermal Grizzly Intel Mycro Dirtect-Die Pro V1. The combination of this and the 100mV under-volt is that it idles at 31°C, and can run all p-cores at full load running Prime95 consuming 320W without thermal throttling, so all 16 p-core threads are between 5.6GHz and 5.7GHz and the hottest core peaking about 96°C. I left it like this for several hours to heat the water loop up before taking these measurements.
Games seem to run between 60°C and 70°C. and the CPU seems stable with the under-volt giving zero errors after hours of Prime95, 8 hour compiles, and running my CPU test that stresses each p-core at a time for an hour, then each pair of p-cores, then groups of 4, and finally all 8 (with e-cores enabled and running all other non-test tasks so the p-cores don't get a break).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
私に届いた14900KSの交換品は、交換前より低電圧耐性が低く100mV低下させると正常に動作しません。
しかし、IntelDefaultSettingでは起動時からスロットリングが発生し、負荷をかけると一瞬でいくつかのコアが100℃に達します。Vminが上昇した劣化品よりもさらに酷い製品が届きました。
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It was only able to run without thermal throttling because I de-lidded it and Direct-Die water cooled it.
To under volt you should use SVID adaptive mode, not a VRM offset if possible. If the motherboard does not support SVID adaptive mode then you need to disable CEP to be able to under-volt. If the chip is stable, under-volting should further reduce the probability of electromigration, so disabling CEP seems safe. If -100mV (-0.1V) isn't stable try -75mV (-0.075V) or -50mV (-0.05V).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
返信頂きありがとうございます。
交換前は360mmAIOで-120mVで動作しましたが、余裕を持って-100mVで運用していました。
室温はペットのために常に一定に保っていましたが、2ヶ月程使用して不安定になり徐々にオフセット値を上げていかないと
安定しなくなり、最終的には-25mVまで上げないと安定動作しなくなったので、Intelに相談したところRMA対応になりました。
交換後の個体は最初から-100mVで動作せず、現在は-15mVで運用していますが負荷がかかるとすぐにスロットリングが発生します。
Defaultの設定ではcinebench r23のスコアも37000と、とても【SPECIAL EDITION】と呼称できる物ではなかったのでとても残念です。交換前はDefaultでも40500程度で温度も最高95℃程度で収まっていました。
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> Before the replacement, it operated at -120mv with a 360mm aio, but it was operated at -100mv with a margin.
This may not have actually been stable. The most voltage is needed by the 'preferred' cores, the ones that can boost up to 6.2GHz. These cores can only boost above 5.9GHz when the other p-cores are idle.
To test the limits of these cores, you need to run both threads in the p-core at high load. For example my 14900ks has preferred cores 8&9 and 10&11. I normally compile GCC to test the cores, because the compiler bootstrap compares the generated code from the stage 2 and stage 3 compiler. which means it catches errors that are too small to cause a crash, but still result in incorrect output. On Gentoo Linux, I use this command to compile GCC using just "virtual cpu" 8 and 9.
time MAKEOPTS="-j3" taskset -c 8-9 emerge -1 gcc
It turns out this can only run reliably with a 50mV undervolt, so the 100mV undervolt above is not stable for two hyperthreads on a single core. This was also the case that failed every time at stock voltage before microcode 0x125
I also did some further full-load multi-core testing with all 8 p-cores (16 threads) using Prime95 (mprime), and found with all p-cores loaded it was stable with an undervolt of 140mV
So it needs -50mV for a single p-core and -140mV for 8 p-cores. So I had the idea of using both an offset undervolt and an AC load line undervolt at the same time. If we assume 8 p-cores is 400 Amps load, and 1 p-core is 50 Amps load (this assumption turns out to be pretty good). we can write the following 2 simultaneous equations using V = IR:
140 = 400 * R + D
50 = 50 * R + D
Solving this we get R = 0.26 mOhm and D = 37 mV
However the motherboard BIOS does not support a -0.037V offset, the closest are -0.035 or -0.040, so we can choose between the following two solutions:
R = 0.26 mOhm, D = 35mV
R = 0.24 mOhm, D = 40mV
I decided to go with the larger voltage offset to help guard against voltage spikes, and so I could use a flatter load line. I chose ASUS LLC 6 (on my W680 motherboard) which appears to be 0.49 mOhm. I confirmed this by setting extreme profile and running Prime95 to push the CPU to the power limit, with the AC_LL and DC_LL both set to 0.49 and confirmed the package power at the limit was reading 320W in the operating system.
Running with LLC 6 = 0.49 mOhm, I set DC_LL = 0.49 mOhm (for correct power readings) and AC_LL to 0.49 - 0.24 = 0.25, with a voltage offset of -40mV.
I ended up with the following:
LLC = 6
AC_LL = 0.25
DC_LL = 0.49
Offset = -0.04 V
This will result in the correct under-volt because the CPU will ask for the required core voltage plus expected current times the DC_LL (0.25), but the VRM will deliver SVID minus actual current times the LLC calibration (0.49), so assuming the CPU current prediction is correct we get:
Vcore = Vreq + 0.25 * Aexp - 0.49 * Aact
If the CPU current prediction is correct, that simplifies to, Vreq + (0.25 - 0.49) * A which gives us the R = 0.24 for the slope of the undervolt.
The worst case is that the CPU predicts 400A but is not really using any current, which would only result in an SVID request 100mV above the required core voltage. This is a lot better than with the standard load-lines which could request 440mV more than required. (Thanks to Buildzoid's "Absolutely Everything About Load Lines on LGA1700" video for clearly explaining how this actually works).
This seems fairly optimal, as reducing the DC_LL any further results in threads not starting correctly in Prime95, no crashes, but you can see some cores are not busy after starting the burn test. Reducing the offset to -0.45mV results in the GCC compile failing. Of course there could be some weird edge case at part load with say 4 p-cores drawing 200A, but starting Prime95 with varying numbers of cores seems stable so far, as does running the GCC compile with varying numbers of cores.
Some manual experimentation, and I have found that DC_LL = 0.24 (so R = 0.25 mOhm, V = 40 mV) seems stable, so either the CPU can handle a slightly larger single p-core undervolt we can only test in 5mV steps, and this would result in a single p-core under-volt of 52.5mV so couldn't be tested with a simple offset) or my estimate of single p-core current of 50A is a bit high. In any case this method gets you very close to the optimal under-volt, which would be tricky with two variables to optimise by hand.
The actual settings I am running with at the moment are:
LLC = 6 (0.49 mOhm)
AC_LL = 0.24 mOhm
DC_LL = 0.49 mOhm
Offset = -0.04 V

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page