- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I bought a 13900k and gave 128 GB RAM. I am running Ubuntu and running 10sof VMs using KVM. I started experiencing random "general protection fault" kernel panics all referred to some type cross cache permission violation which I was able to fix by adding slub_debug=F in kernal parameters as suggested in Kernel panic due to "kmem_cache_alloc+117 from mempool_alloc_slab" on RHEL 7 - Red Hat Customer Portal
I tried to boot into any live USB, it just crashes. It was weird. The kernel is non tainted but it crashes with the same type of permission violation in kmem_cache_alloc, and any live USB I boot even without harddisk had same issues. But with luck I am able to turn on my server, and since I have slub_debug=F added to kernel, it didn't crash during operations and it ran for weeks together.
It was working for sometime, until one day a power failure happened, and then when I restarted the server, it was saying this error I attached here. The errors before slub_debug=F showed different address in the panic, but they are all page fault errors, but I was convinced it was due to freelist pointer corruption as said by RedHat support. I suspected if its faulty RAM, so I ran memtest and it passed. This time, the error is same across different kernels. Even I tried to boot Windows in a new SSD, it couldn't boot, and I attached the BSOD here, which all points to the "general protection fault" by the processor.
But now, its panicking in the initrd phase, while the kernel is doing some udev stuffs, I am never able to find what is causing this because the logs are not recorded since the panic happens in initrd phase, there is nowhere to write them. Interestingly, the same error in the same location is happening even if I boot different kernels via live USB now. I thought I lost the server. I did memtest, it passed again. I removed each peripheral I have connected and tested, nothing helped, Until I read somewhere to use maxcpus=1 and limit the number of CPUs, and it worked, boom my computer is working. Booted up and running, but now with only one CPU. I didn't know what was wrong, until I did the same in BIOS, limited the number of cores to 1, enabling only one core in performance cores and disabled all efficiency cores. I got 2 logical CPUs due to hyper threading and it is working.
I read in a lot of places that the CPU cores are going faulty, https://access.redhat.com/solutions/3915511
Similar situation here: https://www.linuxquestions.org/questions/linux-desktop-74/not-present-page-kernel-panic-4175722803/
As said in above link, I also tried to enable the remaining cores after able to boot with only one core successfully. But I see that CPUs are getting into hardlockups or softlockups. I even tried to add softlockup_panic=0 in kernel params, its not panicing then, but just hangs forever. Its a lockup, CPU is not responding. In syslog and kernlog, I see something like this. Permission violation.
[2.911260] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [2.911260] BUG: unable to handle page fault for address: fffffe00000453a8 [2.911261] #PF: supervisor instruction fetch in kernel mode [2.911261] #PF: error_code(0x0011) - permissions violation [2.911262] PGD 87efc6067 P4D 87efc6067 PUD 87efc4067 PMD 87efc3067 PTE 000000085fc4d163 [2.911264] Thread overran stack, or stack corrupted [2.911264] Oops: 0011:0xfffffc000000453a8
How I came to the conclusion that individual cores are faulty?
I rolled up my sleves and moved further and enabled all efficiency cores, and only one performance core, boom the computer is working normally. Only if I enable the remaining performance cores, the kernel panic is happenning, and its the same error. I am now running good with 17 cores and 18 logical CPUs. Its running amazingly well, I am able to boot liveUSB, even able to run windows.
What is wrong here? Is aindividual CPU core in performance cores has gone faulty? I didn't try experimenting with other performance cores yet since my server is back on, i want it running. I will do that experiment eventually.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, @sibidharan
Thank you for posting on the Intel® communities. I will do my best to help you.
Can you please clarify what are you trying to do? Are you using a normal system as a Server? Is it a virtual machine?
I understand that you have run some software tests, however, have you tested your RAM?
After some research, that Windows BSOD seems to be related to damaged/corrupted RAM, faulty driver, or Windows inability to find files in the nonpaged area.
Is it possible for you to test another processor?
Best regards,
Jocelyn M.
Intel Customer Support Technician.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am replying in this thread Re: i9-13900K : very frequent crashes (Windows 11) with apps, games and benches - Page 2 - Intel Community
I am running i9 13900k with 128GB RAM on ASUS Z790-P WIFI motherboard. I am running Ubuntu 23.04 Server as the host OS and I saw lot of kernel panics due to general protection fault.
Seems like this is a faulty core, disabling all P cores and enabling only E cores fixes this issue. If I turn on one P core, there is no issues. If I turn on all P cores, not even live USB is booting. I did memtest lot of times, it's not the RAM. I even tried to swap the RAM, still the behaviour continues.
How is the system working very well with 1 P-core and all E-cores? Looks like a faulty core.
In the above thread, one of technician asked to use SVID behaviour to Intel Fail Safe. I did it and I was able to boot successfully with 24c 32t. But soon after, my VMs running on this host starts to crash with the same type of General Protection Faults.
I turned off all P cores again, the server is working solid and all VMs are working solid.
I asked my vendor for a spare i7 processor, I will do the test and update back here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, @sibidharan
Thank you for the information provided.
If you are replying to the Intel agent about the issue present in that thread, please be aware that your issue can be specific since your system environment is different however, that issue is being investigated right now and the updates about it will posted in the original thread, not this one.
Best regards,
Jocelyn M.
Intel Customer Support Technician.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Only enabling specific cores are making issues, and I feel they are the same. I am an engineer too and I contribute to linux, so I know about the low level protection faults and when and why it can happen. I am pretty sure that the cores are faulty and thats corrupting the kernel stack, and the windows errors are same, like overran stack or buffer.
I understand it can be due to faulty RAM, but this time its not the case for me or for others posting about frequent crashes in i9 13900K - its a fauly processor. How else my system can run amazingly well with only E cores and turning on P cores crashes it? How can it be RAM?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, @sibidharan
Thank you for your reply.
We understand you have opened another thread with us and we will continue to help you through that channel now to avoid confusion and keep order in the community. We will therefore close this community case.
Thank you for your understanding.
Best regards,
Jocelyn M.
Intel Customer Support Technician.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just changed to 14th gen 19-14900K and all issues are magically gone. The server is booting up butter smooth and no panics anywhere, no lockups anywhere!!
Its the bloody i9-13900K, everyone (or a subset) who bought this is silently suffering.
Please change the CPU. Thats the only solution.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page