Processors
Intel® Processors, Tools, and Utilities
14623 Discussions

i9-13900K : very frequent crashes (Windows 11) with apps, games and benches

LoloWiwi
New Contributor I
92,916 Views

Hi,

 

I built a setup in April 2023 with :

- Intel Core i9-13900K

- Asus ROG Maximus Z790 Hero

- 2x32 GB Corsair Vengeance 6600 MHz

- Asus ROG 4090

- Asus ROG Thor 850 Watts

 

From the very beginning, I had sometimes BSODs, and several apps/games crashing very "reliably".

Even though the PC: 

- Not overclocked (no XMP, so RAM is running at 4800 MHz)

- No Tweaks of any sort in the Bios / default values from Asus Bios.

- Windows 11 Pro 10.0.22621

- Windows / Drivers / Bios are up to date with latest versions as of today.

 

The tests :

- Prime95 : with smallest and small FFT (to only test CPU and CPU cache) -> gives FATAL ERROR (prime numbers errors) on some CPU cores after a few minutes. 

- Cinebench R23 in single core : no problem, no crash during the 10min run

- Cinebench R23 in multi-core : crashes after 2 to 30 seconds systematically.

- GPU tests are fine, they complete with no crash (Furmark)

- Memtest86 : did several runs on the mem at 4800 and 6600 -> no errors, all tests PASS.

- a few games such as Cyberpunk 2077, Horizon Zero Dawn : almost systematically crash when launched.

 

A couple of days ago, I realized that it's probably the CPU

- I use an app for 3D Printing called a "slicer" who prepares the file for 3D printing that would consistently crash on my Windows 11 setup during slicing (after 5 to 10 seconds max), but not on a virtual machine installed with VirtualBox (Windows 10)

- Somebody advised me to try to set the Affinity for the CPU Cores/Threads in Windows 11.

- Also, I found a lot of reports in forums/reddit about problems with i9-13900K...

 

Since then, when I set the Affinity of the apps for only a few cores:

- Bambu Studio slicing fine, no crash whatsoever if affinity set to 8 first (CPU0 to CPU7)

- Cyberpunk 2077, Horizon Zero Dawn : they both run fine when affinity set to 8 first (CPU0 to CPU7), but crash as soon as I change affinity back to all cores/threads.

- Cinebench multi-core : systematically crashes no matter what subset of cores/threads I set.

 

Weirdly enough, I tried the Intel Processor Diagnostic Tool: it always passes, but I don't trust its results, since I've so many other apps/games crashing, and reports by people on the web...

 

So, I need help please with that, I can't trust my CPU even though I need my PC for work every day...

 

Intel SSU report attached.

 

130 Replies
LoloWiwi
New Contributor I
11,708 Views

Here is the screen cap of Intel Processor Diagnostic Tool that shows that Prime Numbers test runs only for 45 seconds :

IPDT runs Prime Numbers Test for only 45 seconds.png

 

 

0 Kudos
LoloWiwi
New Contributor I
11,662 Views

Hi again Deivid,

 

As I mentioned in my previous reply, I found peculiar that IPDT would run a test in 2 minutes only.
I found the documentation for the IPDT command line tool to run Prime Numbers calculation.

 

So :

- I set in BIOS SVID Behaviour="auto" and "MultiCore Enhancement= disabled-enforce all limits".

- I launched IPDT command line tool Math_PrimeNum.exe and Prime95

/c/Program Files/Intel Corporation/Intel Processor Diagnostic Tool 64bit/Math_PrimeNum.exe -avx 2 -c -errstop -s 36000 -resultName /c/Users/laure/OneDrive/Bureau/PrimeNum_test_10min.txt

 

It seems that both crashed after almost 2 hours, see screen cap below.

What does it mean about the CPU if there is an ERROR/FATAL ERROR with those stress tests after almost 2 hours? 

 

IPDT PrimeNum + Prime95 crash after 2h.png

0 Kudos
Alberto_R_Intel
Employee
11,860 Views

Hello LoloWiwi, Thank you very much for sharing those details and the pictures.


We will continue with our research on this subject in order to provide the most accurate response to your inquiries about this scenario. As soon as I get any updates, I will post all the information on this thread.


Regards,

Albert R.


Intel Customer Support Technician


0 Kudos
LoloWiwi
New Contributor I
13,037 Views

Hi Albert,

 

Thank you.

Also, when you get a chance, could you answer a couple of questions I asked in a previous message in this thread, please?

 

Here they are again:

- What does SVID Behavior really do ? It's hard to find in depth infos about the BIOS params (just very shallow explanations here and there)...

- Is it a problem with my i9-13900K that has to be slightly over-volted to be stable ? I've seen many influencers (YouTube) undervolting it for a much better heat dissipation and lower consumption, but none saying that they have to over-volt it to just have it stable...

 

Thank you for your help,

Laurent

0 Kudos
pronasit
Beginner
13,613 Views

The i9-13900K processor experiences very frequent crashes on Windows 11, affecting various applications, games, and benchmarking tools.

0 Kudos
LoloWiwi
New Contributor I
13,731 Views

Hi Pronasit,

 

Interesting. Do you have any extra info about that? Are there any root causes identified for why there would be such problems on Win 11 and not Win 10?

0 Kudos
sibidharan
New Contributor II
13,777 Views

I bought a 13900k and gave 128 GB RAM. I am running Ubuntu Server 23.04 and running 10s of VMs using KVM. I started experiencing random "general protection fault" kernel panics all referred to some type cross cache permission violation which I was able to fix by adding slub_debug=F in kernal parameters as suggested in https://access.redhat.com/solutions/2149041

I tried to boot into any live USB, it just crashes. It was weird. The kernel is non tainted but it crashes with the same type of permission violation in kmem_cache_alloc, and any live USB I boot even without harddisk had same issues. But with luck I am able to turn on my server, and since I have slub_debug=F added to kernel, it didn't crash during operations and it ran for weeks together.

It was working for sometime, until one day a power failure happened, and then when I restarted the server, it was saying this error I attached here. The errors before slub_debug=F showed different address in the panic, but they are all page fault errors, but I was convinced it was due to freelist pointer corruption as said by RedHat support. I suspected if its faulty RAM, so I ran memtest and it passed. This time, the error is same across different kernels. Even I tried to boot Windows in a new SSD, it couldn't boot, and I attached the BSOD here, which all points to the "general protection fault" by the processor.

But now, its panicking in the initrd phase, while the kernel is doing some udev stuffs, I am never able to find what is causing this because the logs are not recorded since the panic happens in initrd phase, there is nowhere to write them. Interestingly, the same error in the same location is happening even if I boot different kernels via live USB now. I thought I lost the server. I did memtest, it passed again. I removed each peripheral I have connected and tested, nothing helped, Until I read somewhere to use maxcpus=1 and limit the number of CPUs, and it worked, boom my computer is working. Booted up and running, but now with only one CPU. I didn't know what was wrong, until I did the same in BIOS, limited the number of cores to 1, enabling only one core in performance cores and disabled all efficiency cores. I got 2 logical CPUs due to hyper threading and it is working.

I read in a lot of places that the CPU cores are going faulty, https://access.redhat.com/solutions/3915511

Similar situation here: https://www.linuxquestions.org/questions/linux-desktop-74/not-present-page-kernel-panic-4175722803/

As said in above link, I also tried to enable the remaining cores after able to boot with only one core successfully. But I see that CPUs are getting into hardlockups or softlockups. I even tried to add softlockup_panic=0 in kernel params, its not panicing then, but just hangs forever. Its a lockup, CPU is not responding. In syslog and kernlog, I see something like this. Permission violation.

<code>
[2.911260] kernel tried to execute NX-protected page - exploit attempt?
(uid: 0)
[2.911260] BUG: unable to handle page fault for address:
fffffe00000453a8
[2.911261] #PF: supervisor instruction fetch in kernel mode
[2.911261] #PF: error_code(0x0011) - permissions violation
[2.911262] PGD 87efc6067 P4D 87efc6067 PUD 87efc4067 PMD 87efc3067 PTE
000000085fc4d163
[2.911264] Thread overran stack, or stack corrupted
[2.911264] Oops: 0011:0xfffffc000000453a8
</code>

How I came to the conclusion that individual cores are faulty?

I rolled up my sleves and moved further and enabled all efficiency cores, and only one performance core, boom the computer is working normally. Only if I enable the remaining performance cores, the kernel panic is happenning, and its the same error.  I am now running good with 17 cores and 18 logical CPUs. Its running amazingly well, I am able to boot liveUSB, even able to run windows.

What is wrong here? Is an individual CPU core in performance cores has gone faulty? I didn't try experimenting with other performance cores yet since my server is back on, i want it running. I will do that experiment eventually.

Windows BSOD:
https://ibb.co/VMqTPLp
https://ibb.co/d5yKv4M

Panic in my server:
https://ibb.co/Y3Dnb84

Panics from different kernels via LiveUSB

https://ibb.co/BwxW3bw
https://ibb.co/y8mPVLB
https://ibb.co/HVwfBBP
https://ibb.co/4NJTbzS
https://ibb.co/KzFfHQh
https://ibb.co/x8WdzhJ
https://ibb.co/370Rfqb
https://ibb.co/svFbPF2
https://ibb.co/4NJTbzS

LoloWiwi
New Contributor I
13,883 Views

Can you try with another CPU ?

I read messages from people on other forums saying they switched from i9-13900K to i7-13700K and all of a sudden, problems gone.

KCLam
Novice
14,003 Views

Hi, I have been facing very very similar issues in the pass months, my computer is set up in late Feb 2023. To name some of the most frequent errors:

1. status access violation/status breakpoint/crash without notice in Chrome, most intensively happened when watching videos in various platforms. (it just crashed once as I was typing this reply, thank god intel has auto-saving in replys)

2. status access violation in other softwares (seen when they crash, but the error site is mostly not 0x000005 as I have seen in other cases).

3. software crashes without notice, it may not happen, but when it happen it happens very intense (like if it crashed and I open it again real quick, it will crash within seconds), e.g. League of Legends (happens both client and in-match), Cyberpunk 2077

 

The methods I tried:

1. Mem Test (https://hcidesign.com/memtest/download.html) All passed (Not sure if it is the best way but don't have a spare usb to install memtest86)

2. The Processor Diagnostic Tool mentioned in this thread, all passed.

3. Some other common replys when searching for "status access violation" on google, can't recall all, it happend for so long time.

 

My specs are as follows:

OS: Windows 10 Pro

CPU: i9-13900k

GPU: MSI 4090 Suprim X

Motherboard: MSI MPG Z790 Edge Wifi

RAM: 32GBx2 G.Skill Trident Z5 RGB Black

Storage: 2TBx2 Samsung 980 Pro

Power: MSI MPG A1000G 

0 Kudos
sibidharan
New Contributor II
13,960 Views

So it is not the Mother Board. I was suspecting if it was the MB, because I am using ASUS Prime Z790-P WIFI. No, thanks for clarifying, we both encounter same issues. It may be due to a faulty core. Try to go into your BIOS and disable all performance cores except one. Like only one performance cores, enable all efficiency cores, you wont experience any crashes. If its the case, replace your CPU. 

0 Kudos
KCLam
Novice
13,960 Views

To those who are facing the same issue as this thread,

I have found a simple test to test for the not functioning core(s) if there is really any.

The idea of the test is to run the frequently faulty application/scenarios in every single core one by one.

0. By any means (most commonly task manager), find the process name that you want to test (right click process -> properties, name can then be seen) . For example, I will test on chrome so it is "chrome.exe" (msedge.exe for edge, etc.)

1. Download and open Process Lasso (https://bitsum.com/)

2. Open Options->CPU->CPU Affinities

2.1 tick "More strictly enforce default affinities" (not sure if that has an effect but I did)

KCLam_2-1697566633071.png

3. Limit the process to run on one core every time by enter 1.process name, 2. CPU affinity (The index of core that you wish to use, one by one from 0-31, just test for 0-15 would probably be enough to find the bad core), 3. Add rule, 4. Click ok to apply the settings.

KCLam_1-1697566404081.png

4. Open the process you entered after the settings are applied. To check if the process is really running on one specific core, you may see on the top right corner of the Process Lasso to see a nearly full occupying green bar, indicating that it is indeed using one core, like core 8 in my case. (Some more greens here because I am running other applications too)

KCLam_3-1697566786544.png

5. Do whatever things that most likely to trigger the errors you usually have, for example open YouTube and watch a video. Just play a video for about ten seconds is enough to tell if the core is fine in my case.

6. Close the application after testing. Go back to CPU Affinities (step 2), test for next core (double-click existing rule to modify, remember to save rule after modifing. 

7. Repeat step 2-6 with every single core. Occasionally you may see the error occured, in my case it is core #4, the fifth core starting from #0 , that by simply opening YouTube in Chrome will crash it with status_access_violation/status_breakpoint. At this point I can finally confirm that this is a problem related to, if not directly casued by, the CPU (i9-13900k).

The example screencap I am posting is repeating the test on core #4 with Edge since I am using Chrome to type this reply. Same as Chrome, within seconds of opening YouTube raised status_access_violation error, refresh and instantly another status_access_violation error raised, so on and so forth.

KCLam_4-1697567240370.png

 

Hope you find this useful, and possibly assure yourself that we are the victims to our CPUs all the way.

 

For reference, my CPU batch number is X252M104T, has the most frequent errors triggered when running applications on core #4.

 

Citation: Idea come from https://community.intel.com/t5/Processors/browser-occasionally-displays-a-status-access-violation-error/td-p/1457802 where the author tested the core performances by other means and used Process Lasso to limit the bad core.

LoloWiwi
New Contributor I
13,856 Views

Hi KCLam,

 

Thanks for the walk through, very useful indeed.


On my side, I used a simpler approach:

- Open Task Manager (Win 11)

- Go to Details

- Find the process you want to test cores on

- Right click on the process and chose "Select affinity"

- Then, select the checkboxes for threads to use (on an I9-13900K, from UC0 to UC31) --> allows to narrow down on which core is faulty.

 

The Intel admins in this thread are still investigating since my last message (I mentioned that IPDT - Intel Processor Diagnostic Tool,  cannot be a serious test for CPUs stability since it runs many tests in 2 minutes only before finishing with "PASS"... So I found the command line tool used to test the CPU cores on prime numbers (PrimeNum.exe) and ran it from the command line for much longer and it crashed...).  

Still waiting for an answer...

 

My i9-13900K batch num is X307K561

0 Kudos
KCLam
Novice
13,820 Views

Dear LoloWiwi,

 

Thanks for providing an even more convenient way of testing.

Have you determined which of your CPU's cores is faulty? You mentioned that your apps run perfectly on cores #0-#7 but become problematic when using all cores. Perhaps there's just one core that isn't functioning well, causing all the issues in core range #8-#31. Identifying and disabling that core might solve the problem.

In my case, when I tested using only CPU#4, both Chrome and Edge consistently malfunctioned. Videos wouldn't play for more than a second, and most of the time, they couldn't even open YouTube. I've been working with Jupyter Notebook on Chrome since my test 5 hours ago, and no errors have occurred. My hypothesis is that the faulty core is causing all the problems, so disabling that core might be a solution.

I'll keep this in mind in the coming days and test whether disabling the faulty core can resolve my issue. It might be worth trying for you too.

P.S. Love your choice of 3D printer! (since I see your slicing app)

0 Kudos
LoloWiwi
New Contributor I
13,772 Views

Hey KCLam,

I haven't precisely determined which core exactly (I am not sure how the mapping between setting affinity to UC0-UC31 in Windows 11 task manager and real CPU threads/core is done), since I found that crashes occur when UC8-UC12 are used, so it's good enough for me

I've reproducibly confirmed that if all threads but UC8-UC12 are used, no crashes whatsoever (cinebench, apps, games).

 

PS: right, I'm a 3D printing addict. You too, I guess

I noticed that slicing softwares (PrusaSlicer, Cura, Bambu Studio) are amazing at testing the CPU (heavily multi-threaded) -> I could consistently have them crashing when slicing some models in less than 10 secs -> just perfect to narrow down on the issue in a few seconds instead of having CPU stress tests running for a long time.

 

 

 

0 Kudos
sibidharan
New Contributor II
13,802 Views

  .

0 Kudos
sibidharan
New Contributor II
13,706 Views

I just changed to 14th gen i9-14900K and all issues are magically gone. The server is booting up butter smooth and no panics anywhere, no lockups anywhere!! 

 

Its the bloody i9-13900K, everyone (or a subset) who bought this is silently suffering.

 

Please change the CPU. Thats the only solution. 

0 Kudos
LoloWiwi
New Contributor I
13,694 Views

Thanks for the feedback Sibidharan, good to know!

 

@Alberto_R_Intel : what else is needed to offer to RMA ?

I've been suffering for 7 months, now... 

 

Intel CPU Warranty is 3 years

LoloWiwi_0-1697751193624.png

 

 

 

 

LoloWiwi
New Contributor I
13,326 Views

Hi, thanks for your advice. All of that has already been checked and documented in this thread.

The issue has been narrowed down to one of CPU cores (P-Cores) being faulty.

 

Just waiting for feedback from Intel's experts who have been investigating for almost 2 weeks now and no RMA proposal in sight...

 

 

0 Kudos
LoloWiwi
New Contributor I
13,204 Views

Hi @Alberto_R_Intel , @DeividA_Intel ,

 

Any news, please ?

 

It's now driving me nuts...

 

In BIOS, I've set : 

- SVID Behavior = Intel Fail Safe

- MultiCore Enhancement = disabled

- No XMP (RAM DDR5 is now running at 4800 MHz)

 

And now, for the last 4 days, my Chrome is crashing randomly, I've had a few Windows BSODs.

Again : if I set affinity to UC0 to UC7 for Chrome processes, it is stable --> I want to kill my i9-13900K.

 

What else do I need to provide to be granted the privilege to RMA it

Intel CPUs have a 3 years warranty and I bought it in April 2023 (7 months ago)...

 

0 Kudos
KCLam
Novice
13,198 Views

I've been constantly checking posts here to check for updates. In a recent post regarding 13900k faulty core, intel staff tells the op to contact for RMA after a few discussions, and tells another op to email contact them in another post. I'd say they should be aware of the issue now. However, this post may be marked as closed by them so no more intel guy's attention. Maybe you should try direct RMA to catch their attention.

For now, I do recommand to use the free version of Process Lasso and set affinity rule to find the specific faulty core and ban it for good. (the process match in setting affinities uses Regular Expression, so a process match of * or *.* may apply the affinity to every single process to prevent any use of that core.)

KCLam_0-1698332128565.png

Edit: I previosly only set affinities to heavily used applications instead of every tasks, I tried after I posted this, and it keep reporting error setting affinity to system tasks like svchost.exe (error log keeps logging). You may wish to only apply for the applications you commonly use, in that case Task Manager could be enough (I don't know if Task Manager is setting it permentantly, but Process Lasso does set it permentantly until rule being removed.) That makes my computer run without crashes eversince.

 

0 Kudos
sibidharan
New Contributor II
13,187 Views
Intel is keeping their mouth shut because if this comes out their share prices will go down? They are well aware of the issue, but here just deferring you and me without any real updates just to frustrate us I believe! They have no intention of helping, just trying to keep us shut or keep us waiting infinitely.

A simple google search says that history of such issues started right after the processor got launched. Intel never said or accepted that some of the processors are faulty. They are just just bought time till the new 14900k is launched, and yea they are successful in that.

How did you purchase your processor btw? Through online or through a vendor?
0 Kudos
Reply