Intel® NUCs
Assistance in Intel® NUC products
13310 Discussions

NUC6i5 crash / reboot under (GPU) load

keepcoding
Beginner
831 Views

Hi

 

For about a year or so my Intel NUC6I5SYK (with the Iris Graphics 540) has been experiencing random crashes. Well they are not exactly 'random', but mostly occur when watching youtube videos with Chrome or during zoom meetings.

What do I mean by "crash"? The picture freezes and about 5 seconds later, the NUC reboots.

 

What I have found out so far:

- for simple tasks (idling, browsing, text editing, etc.) the NUC runs stable

- prime95 runs fine, no crash (20min test)

- furmark makes the NUC crash after about 2 minutes

- temperature readings are ok (90°C max) and I have check and cleaned the fan

- RAM and SSD seem to be fine (no errors found with Memtest or file system scans)

- crash occurs on Windows and Linux (therefore most likely not caused by the OS)

- BIOS is on the latest version

 

It looks like the issue is somehow related to the GPU (Iris 540) or the resulting higher power consumption when the GPU is active. 

 

In the Windows event log I found the following:

- Critical Error, Kernel-Power: "The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly."

 

Any ideas?

0 Kudos
12 Replies
n_scott_pearson
Super User Retired Employee
791 Views

If the system locks up and then, after a small time, resets spontaneously, this is an indication that the Watchdog Timer reset the PC to recovery from a lockup. At this level, this is most-often the result of a memory bus lockup. A memory bus lockup is most-often caused by noise levels on the bus reaching thresholds where data cannot be distinguished from noise. While this can be the result of failures on the motherboard (something creating new noise (which happens with age) or bus support components not suppressing the noise they should be (because they are failing)) or failures occurring in the processor's memory controllers, the most common issue is bad or failing memory. This is what you have to look at first. Yes, I know you said MemTest didn't see anything. If you are talking about the Microsoft utility, well, it is completely worthless AFAIK. It can find failed memory, but is not very good for failing memory. In fact, even the higher-rated programs, like MemTest86 or MemTest86+, will still not find all cases of failing memory. Bottom line, find somewhere to borrow some other memory to try.

Hope this helps,

...S

P.S. That Event Log entry is simply a signal that the system was started up without having recorded that a shutdown took place. It is thus totally useless as any kind of indicator for what caused the failure.

0 Kudos
keepcoding
Beginner
786 Views

Thanks for your answer. I will try to organize a replacement RAM.

 

However, I find it quite strange that the RAM should be the culprit. I mean why does the NUC not crash with prime95 (close to 100% memory usage) and runs totally stable when performing tasks that are not GPU-intentive? 

And yes I used MemTest86.

0 Kudos
n_scott_pearson
Super User Retired Employee
767 Views

I am having a tough time coming up with an explanation that doesn't take me hours to craft. Suffice it to say that that it may not be bad memory -- specific memory cells that cannot be read and written reliably -- because if that was the case, MemTest86 should catch it. There are so many components that could play a role in bus lockups occurring and there could even be some particular sequence of events that has to occur in order to cause it. The problem is how do you figure out where the issue lies? Is it in the processor? Is it some component on the motherboard? We hope it's none of these, since they are integral. I thus look for other components that *can* be replaced and start with them first. Memory can be replaced -- but, since we don't know that this is culprit, we don't want to spend a fortune replacing it just yet. Try some from another system. Think borrow before purchase.

...S

0 Kudos
keepcoding
Beginner
752 Views

I agree it makes sense to test the memory properly since this component can be easily replaced.

 

I was able to get some replacement RAM DIMMs for testing, but unfortunately the NUC still crashes. I also reset the BIOS to the default values to make sure there is no bad setting.

 

Hm, maybe my NUC is toast. Warranty is gone so not sure what to do with it.

0 Kudos
n_scott_pearson
Super User Retired Employee
747 Views

Well, if you have tested with both Windows and Linux and you have tested with two different sets of SODIMM(s), we can conclude that the problem isn't in the memory (obviously, though it could still be in the memory interface) and that it is difficult (though not impossible), considering using multiple O/Ss, to blame the graphics drivers.

Have you tried doing a clean install of the NUC-validated graphics driver? Have you tried with the latest Beta release of the graphics driver?

...S

0 Kudos
keepcoding
Beginner
736 Views

Yes, I have tried a clean re-install of the latest graphics driver (27.20.100.8681). The older driver cannot be installed on Windows anymore for some reason, so I couldn't try that. I don't know anything about beta drivers, where can I get those?

 

I also played around a bit with the BIOS settings and found an option to set the max. sustained and the boost power consumption. When I lower these values to 15 and 23W respectively, Furmark runs a bit longer (around 10 minutes instead of 2 minutes). However, GPU and CPU clock speeds are throttled heavily to achieve the lower power limits. At the time the NUC locked up, the GPU temperature reading was ~80°C.

0 Kudos
n_scott_pearson
Super User Retired Employee
731 Views

Here is link to download page for latest Beta release: https://downloadcenter.intel.com/download/30522/Intel-Graphics-BETA-Windows-10-DCH-Drivers. BTW, I got to this page by going to https://downloadcenter.intel.com, searching for 'DCH' and selecting the resulting Beta release. This is build 9667 whereas the latest production build is 9466.

Hhmmm, I wonder if you are having a power supply issue. Have you tried using a different power supply?

...S

0 Kudos
keepcoding
Beginner
726 Views

Thanks for the link, will try it later.

 

I also suspected the power supply at first, but my tests with other supplies showed the same symptoms. I did not have an original NUC supply to test with but a different one (20V, 65W).

 

Just did some more testing: selected "balanced power" in the BIOS instead of "max. power", which sets the limits to I think 20 and 25W for sustained / boost. In addition, I adjusted the fan speed curve such that the fan spins at max. rpm when temperature reaches ~85°C (SYS temp). Interestingly, this time the Furmark test did not make the NUC crash (tested for 20min). 

This makes me wonder if it could be some kind of thermal shutdown. Is there a way to find out? Could it be that one of the temperature sensors sporadically hits a threshold and triggers a reset? Are these thresholds known / documented somewhere?

0 Kudos
AndrewG_Intel
Moderator
702 Views

Hello @keepcoding

Thank you for posting on the Intel® communities. We hope that the assistance provided by the community has been helpful.


Also, we would like to inform you that due to the Intel® NUC Kit NUC6i5SYK has been discontinued, Intel Customer Service no longer supports inquiries for it, but perhaps fellow community members have the knowledge to jump in and help. You may also find the Discontinued Products website helpful to address your request. Thank you for your understanding.

Please keep in mind that this thread will no longer be monitored by Intel.

 

Best regards,

Andrew G.

Intel Customer Support Technician


0 Kudos
n_scott_pearson
Super User Retired Employee
696 Views

A thermal shutdown is exactly that; the system immediately powers off completely (no windows shutdown, no restart, no nothing). And, when you do power back on (manually, not automatically), the BIOS will inform you that a thermal shutdown had occurred. No, while it is possible that heat could play a role in your failure, it isn't anything simple.

Are you monitoring temperatures with something that can record temperatures? As an example, you could download and run AIDA64 in trial mode; it can do this and lots of other things.

Hope this helps,

...S

0 Kudos
keepcoding
Beginner
680 Views

I tried AIDA64 but it isn't really useful (logging rate is way too low). The BIOS does not show any thermal shutdown warning or anything, so I assume this didn't happen.

I also don't think it is a thermal only problem because sometimes the NUC freezes even though the CPU fan is idling (which usually means low / normal temperatures). Sometimes I'm not even doing much but only watching a video on youtube and the NUC just locks up. Really getting annoying.

I'm now thinking about replacing it. Problem is, NUC11 is currently not available where I live.

 

Anyway, I appreciate your help, thanks!

0 Kudos
n_scott_pearson
Super User Retired Employee
669 Views

Yea, it didn't feel like a thermal issue. Even your original temperatures were nowhere near the thermal shutdown point. 

Nothing is available. Shortages are affecting the whole industry. This is the longer-term affects of a global pandemic...

...S

0 Kudos
Reply