Intel® NUCs
Support for Intel® NUC products
12537 Discussions

NUC11PAHi5 keeps power cycling every 15min - 2hr

justinnonwork
Novice
2,128 Views

[ This is the same as https://bugs.launchpad.net/ubuntu/+source/linux-signed-oem-5.14/+bug/1958692 which I filed, but posting here, too ]

 

I have a NUC11NAHi5 system running Ubuntu 20.04.3

System: NUC11PAHi5
SKU: RNUC11PAHi5000
BIOS: PATGL357.0041.2021.0811.1505

And every few minutes to hours it will crash:

justin :0 :0 Fri Jan 21 19:46 - crash (00:37)
justin :0 :0 Fri Jan 21 19:40 - crash (00:05)
justin :0 :0 Fri Jan 21 17:33 - crash (02:07)
justin :0 :0 Fri Jan 21 17:16 - crash (00:07)
justin :0 :0 Fri Jan 21 17:13 - crash (00:02)
justin :0 :0 Fri Jan 21 16:54 - crash (00:19)
justin :0 :0 Fri Jan 21 16:25 - crash (00:28)
justin :0 :0 Fri Jan 21 16:18 - crash (00:06)
justin :0 :0 Fri Jan 21 16:11 - crash (00:07)
justin :0 :0 Fri Jan 21 15:46 - crash (00:24)

When it comes back up I see the following errors in journalctl:

kernel: BERT: Error records from previous boot:
kernel: [Hardware Error]: event severity: fatal
kernel: [Hardware Error]: Error 0, type: fatal
kernel: [Hardware Error]: section_type: Firmware Error Record Reference
kernel: [Hardware Error]: Firmware Error Record Type: SOC Firmware Error Record Type2
kernel: [Hardware Error]: Revision: 2
kernel: [Hardware Error]: Record Identifier: 8f87f311-c998-4d9e-a0c4-6065518c4f6d

followed by a large hex dump. It could be related to this issue:
https://community.intel.com/t5/Processors/Frequent-crashes-on-i5-11500/td-p/1280709

but I'm not sure.

 

I have two other NUC boxes (a NUC7i3BNK and a D34010WYK) running the same configuration and basically the same kernel with no issues. The BIOS is up-to-date, as is the intel microcode package. It doesn't seem to matter which kernel I use. I'm using the OEM one since the regular kernels for 20.04.3 (a) also have this problem, and (b) have broken IR support (https://askubuntu.com/questions/1380500/kernel-rc-rc0-receive-overflow).

 

I've noticed I can get it to stay up longer by having the system play videos using MythTV, but that could be weird coincidence. It doesn't fully prevent it from crashing but it appears to maybe sorta make it stay up longer?

I'm not sure if it's a kernel issue, BIOS issue, microcode issue, or hardware issue, so I decided to start here. Let me know how else I can help.

0 Kudos
30 Replies
justinnonwork
Novice
1,649 Views

Hi, are there any updates to this? Or is this just a broken piece of hardware and I should go with a different mini-PC?

David_G_Intel
Moderator
1,603 Views

Hello justinnonwork


Thank you for posting on the Intel️® communities. Please share with us the following information:

  • RAM specifications
  • When did this issue start?
  • Does this also happen with Windows 10 or 11?
  • Does this happen when idle or with a specific task/ application?
  • Did you make any changes recently?
  • Do you have any external accessories/components connected to the NUC?


Regards, 

David G 

Intel Customer Support Technician 


justinnonwork
Novice
1,595 Views

Thanks for the reply. In order:

  • (1) G-Skill 16GB F4-3200C22-16GRS SODIMM
  • Since I bought the unit ~3 weeks ago and finished updating it to the latest Ubuntu config
  • I don't know, I don't have a Windows license/haven't tried to install Windows
  • (More details below)
  • No changes
  • The only connections are power, the network (gigabit switch), and HDMI running to a receiver and then a TV

Regarding idle versus specific task, the machine is serving as a media frontend running MythTV. When the machine boots it logs in to the default user account, uses xrandr to limit the display to 24bits and set the output to 1920x1080@120Hz, and then starts mythfrontend. Over the last 400 reboot cycles, when it boots like this the median uptime is 7 minutes, and the 90th percentile is 30 minutes.

The few times I've gotten it to stay up longer than about an hour involve either:

  1. Exiting mythfrontend but leaving the screen at the X display logged in, with the receiver and TV on; or
  2. Having mythfrontend play a bunch of videos.

Doing this allows the machine to stay up anywhere from 2.5 to 9 hours, but it will still eventually crash and reboot. (Note: it just crashed and rebooted while I was typing up this message ...)

Let me know how else I can help.

 

justinnonwork
Novice
1,579 Views

Update: Running memtest86+ for two full iterations (each iteration is four passes of the test and takes about 4 hours in total) yields no errors, and the system doesn't power cycle while memtest is running.

David_G_Intel
Moderator
1,570 Views

Thank you for the update, let me investigate this request and I will post the updates on the thread.


Regards, 

David G 

Intel Customer Support Technician 


justinnonwork
Novice
1,547 Views

More updates. I've tested the stability of the system with the following kernels from the standard Ubuntu 20.04 repos:

  • 5.14.0-1020-oem
  • 5.14.0-1018-oem
  • 5.13.0-27-generic
  • 5.11.0-40-generic
  • 5.11.0-38-generic

All of these have the same stability issues, and only stay up for a few minutes to maybe an hour or two at the outside.

I've tried to attach the Boot Error Record Table log from my last crash/reboot cycle, but it doesn't seem to be letting me add attachments. Instead I've posted the text file to my Google Drive here.

justinnonwork
Novice
1,547 Views

Or, it would seem, it was attaching them but showing an error message on my browser. Sorry.

David_G_Intel
Moderator
1,535 Views

Thank you for the information and reports, to continue the investigation please share with us the results of the Intel® System Support Utility for the Linux* Operating System:


Regards, 

David G 

Intel Customer Support Technician 


justinnonwork
Novice
1,521 Views

Thank you. Output is attached.

David_G_Intel
Moderator
1,511 Views

Thank you for the report, let me continue the investigation and I will keep you informed.


Regards, 

David G 

Intel Customer Support Technician 


David_G_Intel
Moderator
1,493 Views

Hello @justinnonwork


Can you confirm if the SSD was replaced (or if you tried another one) and if you loaded the OS again?


Regards, 

David G 

Intel Customer Support Technician 


justinnonwork
Novice
1,490 Views

The SSD that's in there is the only one that's ever been in there.

 

I only installed the Linux distribution once, but have updated the kernel package as new releases came out over the last several weeks.

David_G_Intel
Moderator
1,478 Views

Is it possible to try another SSD and reload the OS on the system?


Regards, 

David G 

Intel Customer Support Technician 


justinnonwork
Novice
1,471 Views

I don't have any other SSDs handy. Is there anything else I can try before I go out and buy a new SSD?

justinnonwork
Novice
1,436 Views

Just bumping this to the top. Is there anything else I can try out before dropping $100 on another storage device which might not actually fix the problem?

At this point the device is pretty useless and if I'm going to have to spend money just to diagnose it, I'd rather just return it and get something that works out of the box.

David_G_Intel
Moderator
1,428 Views

@justinnonwork you can reinstall the OS since this might be a problem, if this doesn't help then you need to replace the drive and reinstall the OS again as part of troubleshooting. Let us know if you have any questions.


Regards, 

David G 

Intel Customer Support Technician 


justinnonwork
Novice
1,399 Views

Updates:

  • I upgraded the BIOS to 0042. This didn't fix the problem by itself.
  • While not a reinstall, I booted into the latest Ubuntu 20.04 installer image off a USB drive, selected "Try Ubuntu", and let it sit at the logged-in Gnome screen. It crashed after about 30 minutes. I tried doing the same thing again, and it crashed again after 10-30 minutes.
  • I booted into a Windows 10 Home installer via another USB drive and selected "troubleshoot system". It did okay just sitting at the command prompt. After about 16 hours I declared success (or at least not obvious failure) and moved on.
  • I booted back into the Ubuntu 20.04 installer via a USB drive, but this time dropped the system down to multi-user (non-graphical) mode. That was 16+ hours ago and it hasn't crashed yet.

My next step will be to boot into the installed drive and drop down to multi-user/non-graphical mode.

If it doesn't crash, that says to me it's an issue with some interaction between the hardware, the kernel/video driver, and possibly Gnome. I'll start trying some different window managers.

If it does crash, I'll try booting into some different kernels and dropping to multi-user/non-graphical mode and seeing if it crashes with all of them or if some of them stay up but others crash.

David_G_Intel
Moderator
1,384 Views

Thank you for the information, let us know once you finish testing those steps.


Regards, 

David G 

Intel Customer Support Technician


justinnonwork
Novice
1,368 Views

Updates:

  • Even just booting into rescue mode off the installed system (systemctl set-default rescue.target && reboot) will cause the system to crash after a few minutes for 5.14.0-1020-oem; but
  • Booting into that same kernel on the installed system but using init=/bin/bash will NOT cause it to crash.

I'm testing booting into rescue mode on some of the other kernels on the installed system to see if that has any effect.

In the meantime I've attached the set of loaded modules and running processes for booting into /bin/bash versus boot into rescue mode.

David_G_Intel
Moderator
1,361 Views

Thank you for the updates. As a friendly reminder, just for testing the OS reinstall is still recommended so let us know after you try this.


Regards, 

David G 

Intel Customer Support Technician 


Reply