Processors
Intel® Processors, Tools, and Utilities
16001 Discussions

Whea uncorrectable error and pc now boots and fails to repair itself

skilzababy2
Beginner
8,906 Views

Randomly, my PC initiated abrupt shutdowns exclusively during gaming sessions, accompanied by a bluescreen displaying the WHEA_UNCORRECTABLE_ERROR, followed by an automatic restart. Despite initially dismissing it as an isolated incident, the issue persisted consistently upon every subsequent gaming endeavor. I undertook the measure of reinstalling Windows, yet this failed to resolve the problem. The occurrence manifested unpredictably, whether during the loading screen, gameplay, or even Fortnite landings.

Subsequent diagnostics, including the execution of MEMTEST68, ruled out memory concerns, confirming the integrity of both DDR4 RAM modules. Evaluations of the CPU through Intel's diagnostic software and Cinebench yielded no crashes. Similarly, FurMark assessments demonstrated the GPU's stability with no discernible issues. Intriguingly, simultaneous stress tests involving Cinebench and FurMark exhibited system stability, despite games typically not exerting comparable stress on these components.

The focus has shifted towards a potential power-related anomaly, particularly with the GPU drawing power within normal parameters. Concerns arose about the 12V rail, although monitoring software compatibility issues hindered direct assessment. However, the BIOS readings indicated nominal 12V and 5V rail values, consistent with specifications. Despite meticulous cable reinstallation, the issue persisted. While inclined to suspect the power supply unit (PSU), the discrepancy arises in its compatibility with stress-inducing tasks like Cinebench and FurMark.

Recently I have managed to find the WHEA LOGGER ERROR processor code event id 18 in the event viewer. Any insights on the meaning of this ?

The prevailing confusion lies in the incongruity between stable performance during concurrent Cinebench and FurMark usage and the recurrent crashes during any gaming activity. Contemplating the acquisition of a new PSU, I seek assistance in resolving this perplexing and frustrating predicament. Any guidance or insights into a potential solution would be greatly appreciated.

My parts:

  • Rog strix oc 3080 10gb
  • i5 13600kf
  • gigabyte b760 ax ddr4
  • crucial p5 plus 1tb (operating system)
  • crucial p3 plus 2tb (all my games)
  • nzxt h5 elite case
  • nzxt x53 rgb liquid cooler
  • corsair 32gb ddr4 rgb pro sl
  • msi a750gf (i use default psu cables)

Details of one of the whea logger events:

Log Name:      System
Source:        Microsoft-Windows-WHEA-Logger
Date:          01/02/2024 06:42:32
Event ID:      18
Task Category: None
Level:         Error
Keywords:      
User:          LOCAL SERVICE
Computer:      DESKTOP-DPMH1HN
Description:
A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 0

The details view of this entry contains further information. Event XML:

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-WHEA-Logger" Guid="{c26c4f3c-3f66-4e99-8f8a-39405cfed220}" />
    <EventID>18</EventID>
    <Version>0</Version>
    <Level>2</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000000</Keywords>
    <TimeCreated SystemTime="2024-02-01T14:42:32.3373493Z" />
    <EventRecordID>5933</EventRecordID>
    <Correlation ActivityID="{31c761f9-ef62-468f-8b8a-ea3cd84d8ca5}" />
    <Execution ProcessID="5500" ThreadID="6452" />
    <Channel>System</Channel>
    <Computer>DESKTOP-DPMH1HN</Computer>
    <Security UserID="S-1-5-19" />
  </System>
  <EventData>
    <Data Name="ErrorSource">3</Data>
    <Data Name="ApicId">0</Data>
    <Data Name="MCABank">0</Data>
    <Data Name="MciStat">0xb6000000480f0150</Data>
    <Data Name="MciAddr">0xd375d373</Data>
    <Data Name="MciMisc">0x0</Data>
    <Data Name="ErrorType">9</Data>
    <Data Name="TransactionType">0</Data>
    <Data Name="Participation">256</Data>
    <Data Name="RequestType">5</Data>
    <Data Name="MemorIO">256</Data>
    <Data Name="MemHierarchyLvl">0</Data>
    <Data Name="Timeout">256</Data>
    <Data Name="OperationType">256</Data>
    <Data Name="Channel">256</Data>
    <Data Name="Length">936</Data>
    <Data Name="RawData">435045521002FFFFFFFF03000100000002000000A803000031290E00010218140000000000000000000000000000000000000000000000000000000000000000BDC407CF89B7184EB3C41F732CB57131FE6FF5E89C91C54CBA8865ABE14913BBD72FDBC01B55DA0100000000000000000000000000000000000000000000000058010000C00000000003000001000000ADCC7698B447DB4BB65E16F193C4F3DB0000000000000000000000000000000001000000000000000000000000000000000000000000000018020000800000000003000000000000B0A03EDC44A19747B95B53FA242B6E1D0000000000000000000000000000000001000000000000000000000000000000000000000000000098020000100100000003000000000000011D1E8AF94257459C33565E5CC3F7E8000000000000000000000000000000000100000000000000000000000000000000000000000000007F01000000000000000201030000000071060B00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000700000000000000000000000000000071060B0000088000BFFBFA7FFFFBEBBF0000000000000000000000000000000000000000000000000000000000000000F50157A5EFE3DE43AC72249B573FAD2C03000000000000009F0014060000000073D375D30000000000000000000000000000000000000000000000000000000002000000010000005D641ECA1C55DA0100000000050000000000000000000000000000000000000050010F48000000B673D375D30000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000160C000000000000</Data>
  </EventData>
</Event>

Details of kernel power event:

Log Name:      System
Source:        Microsoft-Windows-Kernel-Power
Date:          01/02/2024 05:27:51
Event ID:      41
Task Category: (63)
Level:         Critical
Keywords:      (70368744177664),(2)
User:          SYSTEM
Computer:      DESKTOP-DPMH1HN
Description:
The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Kernel-Power" Guid="{331c3b3a-2005-44c2-ac5e-77220c37d6b4}" />
    <EventID>41</EventID>
    <Version>8</Version>
    <Level>1</Level>
    <Task>63</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000400000000002</Keywords>
    <TimeCreated SystemTime="2024-02-01T13:27:51.8024703Z" />
    <EventRecordID>5167</EventRecordID>
    <Correlation />
    <Execution ProcessID="4" ThreadID="8" />
    <Channel>System</Channel>
    <Computer>DESKTOP-DPMH1HN</Computer>
    <Security UserID="S-1-5-18" />
  </System>
  <EventData>
    <Data Name="BugcheckCode">292</Data>
    <Data Name="BugcheckParameter1">0x0</Data>
    <Data Name="BugcheckParameter2">0xffff85881f77a028</Data>
    <Data Name="BugcheckParameter3">0xb6000000</Data>
    <Data Name="BugcheckParameter4">0x480f0150</Data>
    <Data Name="SleepInProgress">0</Data>
    <Data Name="PowerButtonTimestamp">0</Data>
    <Data Name="BootAppStatus">0</Data>
    <Data Name="Checkpoint">0</Data>
    <Data Name="ConnectedStandbyInProgress">false</Data>
    <Data Name="SystemSleepTransitionsToOn">1</Data>
    <Data Name="CsEntryScenarioInstanceId">0</Data>
    <Data Name="BugcheckInfoFromEFI">false</Data>
    <Data Name="CheckpointStatus">0</Data>
    <Data Name="CsEntryScenarioInstanceIdV2">0</Data>
    <Data Name="LongPowerButtonPressDetected">false</Data>
  </EventData>
</Event>

Multiple errors in event viewer that I never noticed:

Page 1 

 

 

Page 2 

 

 

Page 3 

 

 

Something is definitely wrong and I think it is the power supply, any thoughts?

Update 1: 12v rail and all is fine, still baffled why it happened, also reinstalled windows clean so that might have helped:

 

 

Issue fixed until it happens again I guess. Problem I think was that I had another 2tb nvme. Could have drawn too much power because I took it out now and Fortnite is running perfectly.

Update 2: It happened again on fortnite opening, not sure how a reinstall helped at all to be honest.

Event properties of most recent example:

Kernel Power Critical Error ID 41:

 

 

    Log Name:      System
Source:        Microsoft-Windows-Kernel-Power
Date:          02/02/2024 23:23:25
Event ID:      41
Task Category: (63)
Level:         Critical
Keywords:      (70368744177664),(2)
User:          SYSTEM
Computer:      DESKTOP-10JOBPJ
Description:
The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Kernel-Power" Guid="{331c3b3a-2005-44c2-ac5e-77220c37d6b4}" />
    <EventID>41</EventID>
    <Version>9</Version>
    <Level>1</Level>
    <Task>63</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000400000000002</Keywords>
    <TimeCreated SystemTime="2024-02-03T07:23:25.6856688Z" />
    <EventRecordID>1214</EventRecordID>
    <Correlation />
    <Execution ProcessID="4" ThreadID="8" />
    <Channel>System</Channel>
    <Computer>DESKTOP-10JOBPJ</Computer>
    <Security UserID="S-1-5-18" />
  </System>
  <EventData>
    <Data Name="BugcheckCode">292</Data>
    <Data Name="BugcheckParameter1">0x0</Data>
    <Data Name="BugcheckParameter2">0xffffdd8f02b75028</Data>
    <Data Name="BugcheckParameter3">0xb6000000</Data>
    <Data Name="BugcheckParameter4">0x480f0150</Data>
    <Data Name="SleepInProgress">0</Data>
    <Data Name="PowerButtonTimestamp">0</Data>
    <Data Name="BootAppStatus">0</Data>
    <Data Name="Checkpoint">0</Data>
    <Data Name="ConnectedStandbyInProgress">false</Data>
    <Data Name="SystemSleepTransitionsToOn">0</Data>
    <Data Name="CsEntryScenarioInstanceId">1</Data>
    <Data Name="BugcheckInfoFromEFI">false</Data>
    <Data Name="CheckpointStatus">0</Data>
    <Data Name="CsEntryScenarioInstanceIdV2">1</Data>
    <Data Name="LongPowerButtonPressDetected">false</Data>
    <Data Name="LidReliability">false</Data>
    <Data Name="InputSuppressionState">0</Data>
    <Data Name="PowerButtonSuppressionState">0</Data>
    <Data Name="LidState">3</Data>
  </EventData>
</Event>
 

 

    Log Name:      System
Source:        Microsoft-Windows-WHEA-Logger
Date:          02/02/2024 23:23:33
Event ID:      1
Task Category: None
Level:         Error
Keywords:      WHEA Error Event Logs
User:          LOCAL SERVICE
Computer:      DESKTOP-10JOBPJ
Description:
A fatal hardware error has occurred. A record describing the condition is contained in the data section of this event.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-WHEA-Logger" Guid="{c26c4f3c-3f66-4e99-8f8a-39405cfed220}" />
    <EventID>1</EventID>
    <Version>0</Version>
    <Level>2</Level>
    <Task>0</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000002</Keywords>
    <TimeCreated SystemTime="2024-02-03T07:23:33.6309064Z" />
    <EventRecordID>1291</EventRecordID>
    <Correlation ActivityID="{0b2749ee-9575-4bdc-91be-91e68a7e4d81}" />
    <Execution ProcessID="4520" ThreadID="4804" />
    <Channel>System</Channel>
    <Computer>DESKTOP-10JOBPJ</Computer>
    <Security UserID="S-1-5-19" />
  </System>
  <EventData>
    <Data Name="Length">1019</Data>
    <Data Name="RawData">435045521002FFFFFFFF04000100000002000000FB03000000170700030218140000000000000000000000000000000000000000000000000000000000000000BDC407CF89B7184EB3C41F732CB57131FE6FF5E89C91C54CBA8865ABE14913BB1142D89D6156DA01200000000000000000000000000000000000000000000000A00100005000000000030000010000001411BCA5646FDE4EB8633E83ED7C83B100000000000000000000000000000000010000000000000000000000000000000000000000000000F0010000C00000000003000000000000ADCC7698B447DB4BB65E16F193C4F3DB00000000000000000000000000000000010000000000000000000000000000000000000000000000B0020000240100000003000000000000011D1E8AF94257459C33565E5CC3F7E800000000000000000000000000000000010000000000000000000000000000000000000000000000D4030000270000000003000000000000A13248C3C302524CA9F19F1D5D7723FC0000000000000000000000000000000003000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007F21000000000000000201030000000071060B00010000400000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000300000001000000936FA7D17156DA0100000000050000000000000000000000000000000000000050010F48000000B673D375D30000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000160C000000000000000000000000000000000000000000000000000000010000000000000000000000FF00000000000000000000000000000000000000000000000000</Data>
  </EventData>
</Event>

Update 3: Updated BIOS to firmware F9 and didn't solve anything
Update 4: After swapping the psu cables around going into the gpu, ie: 2 are normal cables on is able to daisychain, the pc did not even blue screen when opening fortnite, just completely turned off and on again. On my main screen the blue screen was showing but on my second monitor this is what I saw this weird bug during the BSOD.
Update 5: Now my PC blue screens every time I turn it on and it attempts to repair itself but fails. I think it is best to take it a professional to fix it. If you guys no something I don't that would be great.

 

 

Labels (1)
0 Kudos
14 Replies
KrissyG
New Contributor II
8,869 Views

Hmm, that sounds like there could be a problem with the SSD you are using.
I never had such situation where the windows tries to fix itself, but i had a cheap SSD overheating while transfering files, and i could reproduce it each time. The PC would then just freeze, as the SSD was the one with the WIndows OS on it. The OS did not even attempt to fix itself, it just woudl freeze each time i moved too many files.

Have you tried to install OS on different SSDs? or HDDs?
and just in case, i would unplug everything, and plug it back again, this may not solve the issue, but it's worth a try. 

And since you are now stuck with useless PC, reinstalling on other drives may seem the only solution.

0 Kudos
skilzababy2
Beginner
8,856 Views

I can't try it on my other ssd which is 2tb because 1. I would have to wipe it and I cannot because it has very valuable files and 2. Where do I install windows if my PC doesn't turn on. I will try and reseat my main SSD but this issue happened after I tried undervolting the CPU using the Voltage offset.

0 Kudos
KrissyG
New Contributor II
8,814 Views

But now your pc is on defualt/optimized BIOS settings, right? and it still fails to boot into windows? 

Hmm, i mean, if you don't have any spare drives then i would get the cheapest SSD there is and try that, or even HDD, it doesn't matter what, just to see if it will work or not.

Also, if the files on that other SSD are valuable, why don't you have a copy of those? if the SSD dies, you may lose everything taht is on it. Always keep a copy of the files and always keep a spare drive just in case, i learned my lesson after various HDDs and SSDs died on me.

In fact, ever since, i keep at least one copy of teh OS drive, and i did it with Clonezilla. That thing clones everythign that is on the drive so perfectly, that neither windows OS nor your microsoft account knows the difference obviously you should then not attempt and use taht drive with other PCs as OS drive.

0 Kudos
skilzababy2
Beginner
8,775 Views

yes but the problem is that my files where originally on the boot drive (never changed it) and I moved them to the 2tb one. So I cant use the other drive. maybe a repair shop will boot with another drive

0 Kudos
ACarmona_Intel
Moderator
8,758 Views

Hello Skilzababy2,

 

Thank you so much for posting in our Intel communities.

 

We understand that you are having issues with your system, as it keeps having a blue screen with an error message of WHEA_UNCORRECTABLE_ERROR.

 

Based on our research, here are the possible reasons why the issue has occurred:

 

  • Outdated graphics drivers,
  • Corrupt hardware (damaged hard drives/SSDs, GPU, CPU, PSU, corrupt RAM, etc.)
  • Driver compatibility issues.
  • Heat and voltage issues (overclocking and voltage changes)

 

In regards to that, please follow the troubleshooting steps outlined in the link below for a possible solution.

 

By the way, kindly disregard the troubleshooting actions you have already completed.

 

How to Resolve Blue Screen Error with WHEA_UNCORRECTABLE_ERROR in Windows*:

https://www.intel.com/content/www/us/en/support/articles/000028099/processors/intel-core-processors.html

 

Please let us know the result of the troubleshooting that you are about to perform so we can take the next step.

 

Thank you, and have a great day ahead!

 

 

 

Best regards,

Carmona A.

Intel Customer Support Technician

 

0 Kudos
skilzababy2
Beginner
8,717 Views

Well I believe the ssd issue is gone, i had my pc off for about a week and took the ssd out to boot into bios just now. after inserting the ssd back in it booted into windows and worked fine. I do not know where to go from here. I will download another game that I do think did not crash my pc with whea error. The game is cities skylines 2. Ill check now and get you an update. Other than that cpu is fine, did the diagnostic tool. So is the gpu it was fine when I tested it on furmark. Could be the motherboard but I don't know how to check that. Any ideas?

0 Kudos
KrissyG
New Contributor II
8,703 Views

if you keep on getting Boot loops again, i would check the BIOS battery.
However, if the battery is dead or makes the BIOS reset continously, then you would not get any blue screen at all, the PC would just turn itself off an on, again and again, but with no response from the scrreen.

Also in your case putting on a game seems to be related, which would have no effect on that battery, but it is worth checking that battery.


0 Kudos
skilzababy2
Beginner
8,689 Views

I have already tried that and it has not worked. what baffles me is that all the stress tests are fine and the lowest the 12v rail goes under stress is 11.808v.  But whenever I run a game it BSODs. It doesn't even load fortnite into the main menu. Maybe the windows is corrupt. I already ran an a sfc /scannow

0 Kudos
KrissyG
New Contributor II
8,661 Views

hmm, you have the 'AX' version of the same motherboard i have, mine is just 'X' , does yours show 12V status somewhere? mine does not allow that, only in BIOS i can see it, therefore a question, do you measure it with a multimeter? or how do you do it?

0 Kudos
skilzababy2
Beginner
8,619 Views
Yes well, funny story. I’m not an electrician so I go on YouTube to find a way to use my multimeter. The second I plug it in to the molex cable my pc dies. Not actually but it instantly shut off. A guy on super user where I initially put my issue said that it could have been the friction between the cables but I’m not sure. To sum up the pc still turns on and works but can’t open a game. To answer your question my voltages are stated in the bios on the right hand side. Honestly I just use HWINFO to check everything. Current, temps everything really. The lowest the 12v rail dips to under gpu stress on cinebench is about 11.808 according to hwinfo. Is this normal. If you want to check with a multimeter do it at your own risk. Could give you a jump scare because your pc shut off but nothing happened the multimeter does not give a current it tests it.
0 Kudos
KrissyG
New Contributor II
8,589 Views

My HW info does not show 12V at all, neither does the Gigabyte software.

KrissyG_34-1707595305958.png

 


....I do not reccomend using a Multimeter on a motherbard, bcoz you may accidentally touch or short some pins with the probes.....that being said:


You can't plug a multimeter into MOLEX and measure current - that is literally a shortcut for the power supply.

Voltage is what you want to measure and you can measure it always, that does no shortcuts.

But make sure your multimeter is set to measure Voltage, as well as the probes are in the right sockets.

Cables with the color YELLOW = +12V , RED = +5V, ORANGE = +3,3V, BLACK = ground/minus pole.
Other colors are used for data which the power supply may use to turn on or to regulate fan speed and power output.

Now assuming my Multimeter is quite accurate (i doubt that), i measured the 12V on the EPS12V and then the PCIe 12V for the Graphics card.
On the graphics card i got 12.30V, while on the EPS12V i got this (left at idle 3Watts?, and right when the XTU shows TDP at 270Watts):

KrissyG_33-1707593784022.png

Now, need to mention, what i measure there is the voltage that goes to the MOSFETs, so there is some power loss on there, and on the way to the CPU too. This means, i can not calculate the precise current that flows through the 12V rails, but i could assume there is no power loss.

Power = Volatage x Current
270W= 12.11V x Current
Current = 270W / 12,11V = 22.29A

If i assume i get 1,4V at the CPU, then 270W / 1.4V = 192A You can do some good welds with that current lol
So between the CPU pins and the MOSFETs, the PCB has to deliver some 192 Apms 



My power supply is a Corsair GS800, which means at 12V it can deliver up to 60A  , but my Multimeter can measure 10A max.
So even if i wanted to, and i would have to cut into the cables to put the probes of the multimeter between the power supply and the CPU or graphics card, i would still not be able to measure the current at 100% power, not even at 50%!

Shorting the power supply should not damag anything, except a HDD, bcoz those may get the write/read head stuck on the plates and that will definitely damage a HDD. 


As for use of a Multimeters, this is all you need to know, on the display it should also say 'V' like here top right corner of the display:

KrissyG_30-1707593627282.png


If i plug the probes into the outlets that say 'mA' or 'A' and i then attempt to measure voltage, i will make myself very unhappy.

As summary, i can say that nowhere on the power supply outlets, i could measure less than 12V, even at full CPU load it was at 12.11V (+/- 0.1V for the tollerance on the measurment).

0 Kudos
VonM_Intel
Moderator
8,524 Views

Hi, skilzababy2.


I hope you are doing fine.


I would like to know, if are you still experiencing the issue with the BSOD and the error message "WHEA_UNCORRECTABLE_ERROR"? So we can assist you further.


Best Regards,

Von M.

Intel Customer Support Technician


0 Kudos
skilzababy2
Beginner
8,508 Views

I honestly don't what happened. IT FIXED ITSELF. I can now repeatdly open fornite and play with no crashes. It did crash yesterday when I was playing however. Any ideas what this bsod means?

0 Kudos
VonM_Intel
Moderator
8,463 Views

Hi, skilzababy2.


I appreciate your confirmation. We will take note of this in case other community members encounter the same issue. In Microsoft Windows, “Blue Screen of Death” is usually abbreviated as BSOD. It describes an error that hits the operating system hard enough that it’s forced to quit. Also, It is usually encountered when the system boots up or during normal computer operation.


If you need further assistance, please submit a new question as this thread will no longer be monitored.


Best regards,

Von M.

Intel Customer Support Technician



0 Kudos
Reply