Mobile and Desktop Processors
Intel® Core™ processors, Intel Atom® processors, tools, and utilities

14900ks unstable

Keean
Novice
27 981 Visites
I have a new 14900ks installed on an ASUS W680 Pro - ace motherboard with 64Gb of 5600 ddr5 ECC (Kingston) and I am testing on Gentoo Linux using:

taskset -c 0-15 emerge -e @World

This recompiles the whole system using just the P-cores, it takes half a day to a day to complete the recompilation of ~1400 packages.

I have rasdeamon running to log hardware errors.

With the performance profile (Iccmax=307a, pl1=253w, pl2=253w) the CPU is unstable with anything less than VRM load line level 6 (Asus bios)

Interestingly it is also stable at LL6 in the extreme profile (Iccmax=400, pl1=320, pl2= 320).

When using a lower load line (tested from the MB default of three up to 5) RAS shows the errors are consistently on CPU 0x8, and are either instruction fetch failures from the level 0 instruction cache, or TLB errors.

I previously had a 13900ks which ran fine with unlocked power limits (Iccmax=511.75a, pl1=4095, pl2=4095)

I have a pretty good water cooling setup (6x120 shared between CPU and GPU, but GPU is idle in all these tests). Water temperature is 31-32°C once warmed up for the duration of the test, room temp about 25°C.

- Am I right in assuming that CPU=0x8 on all these errors means that P-core 8 might be "bad"?

- is needing load line level 6 to get the CPU stable usual and/or something to worry about?

Thanks for any help you can offer.
Étiquettes (1)
0 Compliments
45 Réponses
YenForYang
Débutant
1 150 Visites

You mentioned that your board is an Asus W680 (and I have the same board). As far as I can tell, there is no way to undervolt using an adaptive offset voltage (missing Adaptive option under Global Core SVID Voltage). What settings are you using to achieve an offset voltage?

0 Compliments
Keean
Novice
1 122 Visites

Yes, I found this as well. The only option to get a decent under-volt is to disable IA CEP. Obviously this has some risk associated with it, but I think its probably okay I am using it to lower the voltage, and the degradation was caused by too high voltages. So it may even be safer, used along with a lower VR Voltage Limit (I am using 1500, so 50mV less than the new limit set by Intel in the microcode), with the undervolt.

 

I think you also want to use a flatter load-line-calibration (I am using asus mode 6), because of the way the voltage is calculated for the SVID. The lower the load-line, the higher the requested SVID voltage when the processor is lightly loaded (it should make little or no difference at high load). So if you lower the VR Voltage Limit you will find you lose performance at light load (so single core performance) unless you also increase the LLC.

 

As we don't actually want to change the voltage received by the CPU, its important to set the DC_LL and AC_LL correctly for the selected LLC. It does not do this automatically, it leaves them at 1.1/1.1 so just selecting LLC will over-volt the processor at low load... because the CPU will add on the voltage as if LLC3 is selected (1.1 milli-ohms * max_current), but the VRM will only lower it by (0.49 milli-ohms * max_current).

 

I think LLC6 is 0.49 milli-ohms, but unlike the ROG boards Asus wouldn't tell me the impedances as they thought it was some kind of trade secret that gave them a competitive advantage - even though they do publish them in the BIOS itself for the ROG boards... that's just weird.

 

You can't actually have that big a voltage offset at low load, because the CPU needs a lot to boost a single core to 6.2GHz with both hyperthreads running. I think a lot of people use a larger offset, and think their CPU is stable, but only test with a single core load, when each p-core has two hyperthreads. To properly test for low load stability with an under-volt you need to load both hyperthreads in a single 6.2GHz p-code at the same time (it will draw about 60-70W) 

 

I also used the AC_LL to undervolt for high-load, by having the AC_LL less than 0.49 the CPU will request an even lower voltage added on, whilst the VRM will reduce the voltage by the same amount. I think I ended up with an AC_LL around 0.23, and a static offset of around -35mV.

 

so the result of this is the CPU will be undervolted by 35mV at zero load (cpu current draw is zero), and the undervolt will increase to max load (400A) where it will be (0.49 - 0.24) * 400 - 35 = 135mv

 

So we have an undervolt that varies between 35mv and 135mv depending on CPU load (it seems you can undervolt more at higher load because the cores reduce their max clock speed as the heat of the CPU increases. I think this is a case where the better your cooling the worse under-volt you can achieve.

 

Also another tip, this motherboard appears to set a "hidden" limit in the CPU, it sets PL4 = 380W, which limits the max-clock speed under high load. As I am direct-die water-cooling and can dissipate a lot of heat (idles at 31C, max around 95C with all p-cores at 5.9GHz, all e-cores at 4.5GHz), I set PL1=PL2=4095 (unlimited) and in the operating system disabled PL4 (by setting it to zero). You can't set PL4 in the bios, so this has to be done in the OS.

 

This setup actually resulted in the VRMs on the asus WS Pro W680-ace motherboard being the performance limit, which I think was partly due to the direct-die cooling, so I had to add a 120mm fan pointing down at the VRM heat sinks (speed controlled by VRM temp) to fix that. 

 

This setup can run Prime95 max power / torture mode on all cores without thermal throttling, and on all p-cores with e-cores enabled but idle without thermal-throttling.

0 Compliments
YenForYang
Débutant
1 101 Visites

Is it actually necessary to disable CEP if you tweak AC LL and DC LL (to get VID and Vcore to match)? I thought that if you keep VID and Vcore close enough CEP wouldn't get triggered.

Ive been using LLC6 as well (mainly settled on this after watching a Buildzoid video on the topic), but as far as I can tell, I don’t think the impedance is 0.49 like the other boards based on the DC LL adjustment I’ve made. My observation is it’s higher, but I’m not confident in the slightest. I was going to contact ASUS and ask them directly, but it looks like you already tried that so I won’t bother. Wonder if it’s worth trying LLC7 actually…

And yeah, I’ve noticed that load lines aren’t being calibrated automatically, even with enabling "Sync ACDC Loadline with VRM Loadline", which didn’t seem to have an effect. Wonder if it’s just another issue specific to W680 (and not Z690/790)

Do you know the name of the setting you used to set the static voltage offset? I wonder if it’s actually worth setting such an offset. What sort of scores do you get in Cinebench R23? I'm not water-cooling, but I've got the 14900KS and 5600mhz ECC,  with a contact frame and D15 G2 setup.

0 Compliments
Keean
Novice
942 Visites

I am using Kingston 5600 ECC DDR5, which probably limits benchmark results to some degree.

I used ThrottleStop to disable PL4 under windows (set to 0)

I get the following Cinebench R23 scores:

single: 2329 avg (max 2331) / max core temp 65
multi: 40960 avg (max 41018) / max core temp 85 / max watts 354

 

Watching the limits using ThrottleStop I only see Core/TVB and Ring/EDP for single core, and Core/EDP and Ring/EDP for multi core. I still have the Intel recommended 400A current limit - which I think is the important one.

I don't think there is much that can be improved from here: single core is limited by TVB so better cooling would help, but it's already direct-die water cooled, and the under volt offset is as large as I could make it and still pass the two hyperthreads on one p-core compile test. Multi-core is EDP (current) limited, and the AC load line under-volt is as large as it can be without Prime95 having threads fail to launch.

The only possibility I haven't really explored is undervolting the Ring to prevent that hitting the EDP limit.

 

I think LLC6 is 0.49 mOhm based on power readings - but again I am not confident. However as long as the LLC is more than the AC_LL value of 0.23 mOhm, it will be an under-volt and not an over-volt. So if LLC6 is greater than 0.49 it just means the under volt is larger than I think it is.

 

Buildzoid did not seem to recommend the higher LLC than 6 due to instability in the regulation.

0 Compliments
Répondre