- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
Encountering spurious reboot on different machine with i7-8650U, I was looking for a way to have more information about the reboot causes.
AFAIU reboot/reset can come from multiples sources:
* user software reboot -> OK I have some log in journalctl
* kernel panic reboot -> OK I have some log on the console or pstore
* HW watchdog reboot -> KO, no log are available
* Thermal reboot -> KO, no log are available
* power loss -> KO, no log are available
* ...
For the reboot without log I found a register in the PMC which seems to give the causes of reset, for this CPU i7-8650U the manual seems to be 7th and 8th Generation Intel® Processor Family I/O for U/Y Platforms and the interesting register is 5.3.52 Global Reset Causes (GBLRST_CAUSE0)—Offset 124h in page 194.
Unfortunately reading this register after linux boot (with modified `intel_pmc_core` driver) always give me 0x00000000 even after a HW TCO watchdog reboot...
Do you guys have any idea on why it does not work ?
An assumption is that the platform firmware may read and clean it before Linux can read it ?
Apart from this problem I would be very interested on you feedback on how on a linux system one can identify different reboot among:
* power loss (BIOS is set to restore AC Power Loss)
* HW intel TCO watchdog
* thermal reset
Thanks for your help.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello ubifred,
Thank you for posting on the Intel® communities. I realize how complicated is this issue for you and to have a better view I would like to confirm the following:
1. Are all your i7-8650U systems using the same motherboard or laptop? If the laptop, provide the brand and model name.
2. What is your Linux kernel and version?
3. Can you provide more details about your concern? It is not clear to me if you are looking for support for the reboots or if you just want details about the reset reason on Linux.
4. How many units are affected by spurious reboots?
5. I would like to know if by "why it does not work?" you mean asking why the CPUs are rebooting randomly. Is that correct?
Regards,
Deivid A.
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Deivid and thanks for the reply
Here are the answer for your questions:
1. Are all your i7-8650U systems using the same motherboard or laptop? If the laptop, provide the brand and model name.
-> yes the issue occurs on NUC7i7DNB platform
2. What is your Linux kernel and version?
-> Linux version is mainly 5.15, we follow the intel linux LTS code: https://github.com/intel/linux-intel-lts/tree/5.15/linux, but issue also occurred on kernel 4.19
3. Can you provide more details about your concern? It is not clear to me if you are looking for support for the reboots or if you just want details about the reset reason on Linux.
-> I am looking for a way / idea to have reset reason on linux, mainly to differentiate HW (intel TCO) watchdog reboot than power loss
4. How many units are affected by spurious reboots?
-> circa all ~400 had a spurious reboot (we detect it by checking a vfat partition on boot which have "dirty bit" set when not umonuted properly) but for most of them we suspect it is caused by power loss, but some of them may also have other reboot reason that we are trying to triage.
5. I would like to know if by "why it does not work?" you mean asking why the CPUs are rebooting randomly. Is that correct?
-> no sorry if it was not clear, I mean why after a successful (provoked) watchdog trigger the register GBLRST_CAUSE0 value is 0x00000000, while I expect it to have some of its "WDT" bit set to 1.
But this register is maybe not related to the iTCO Watchdog ? and the different watchdog in its description are other watchdog ?
If it is the case my next question is how to know if a reboot was caused by the Intel TCO Watchog ?
Many thanks for you support.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello ubifred,
Thanks for your response. Before I investigate this issue internally, I would like to know which steps you followed in the link you shared. Unfortunately, when I tried to open this link, it gave me an error "404 error".
Also, from one of the units affected, please attach the report from the Intel® System Support Utility (Intel® SSU):
- Download the Intel® SSU and save the application on your computer: https://www.intel.com/content/www/us/en/download/18895/26735/intel-system-support-utility-for-the-linux-operating-system.html
- Open the application, check the "Everything" checkbox, and click "Scan" to see the system and device information. The Intel® SSU defaults to the "Summary View" on the output screen following the scan. Click the menu where it says "Summary" to change to "Detailed View".
- To save your scan, click Next and click Save.
Regards,
Deivid A.
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for the follow up, and sorry for the non working link there was an extra erroneous ',' that shall be remove, the working link is: https://github.com/intel/linux-intel-lts/tree/5.15/linux
I execute the SSU on a machine that had spurious reboot and attached the results in this post.
Please note that as our Linux distribution is based on a custom Yocto build it does not have all the optional SSU inspection tools (like wodim, lscpu, lshw, ...)
Just to go back to the original topic, I am looking for a way to differentiate spurious reboot reason, mainly differentiate hardware iTCO watchdog reset, thermal reset and crude power loss reset (which for all of them does not produce any log AFAIK).
Many thanks for your support.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello ubifred,
Thank you for the information provided
I will proceed to check the issue internally and post back soon with more details.
Best regards,
Deivid A.
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello ubifred,
Thanks for your time, I noticed that your operating system (Linux distribution based on a custom Yocto) is not supported and validated for your NUC. I recommend you check with a support operating system (Windows 10, 64-bit, Windows 10 IoT Enterprise, Ubuntu 16.04) and let me know if the issue persists.
Regards,
Deivid A.
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello ubifred,
I am following up because I want to know if you were able to test your NUC with a compatible operating system (Windows 10, 64-bit, Windows 10 IoT Enterprise, Ubuntu 16.04).
I will be waiting for your confirmation.
Regards,
Deivid A.
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am not able to install another OS as our machines are remote and I have no physical access to them, furthermore the spurious reboot are scared and I would have to install it on many device to get one reboot which is not possible.
I would just rewind a bit on the original topic: I am looking for a way to differentiate spurious reboot reason on intel platform (namely i7-8650U here), mainly to differentiate hardware iTCO watchdog reset, thermal reset and crude power loss reset (which for all of them does not produce any log AFAIK).
I would assume that if it can be done on a supported linux OS like ubuntu I will able to adapt it to the intel official linux kernel ( https://github.com/intel/linux-intel-lts/tree/5.15/linux ) ?
Thanks for the help
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello ubifred,
Thanks for the confirmation and I am sorry for the inconvenience. I will check this further to confirm if we can get the information you requested, however, bear in mind the limitation due to the unsupported operating system.
Best regards,
Deivid A.
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello ubifred,
Thanks for your time. In this case, intel can only guarantee the proper performance with a validated operating system and since you are using a customized operating system, Intel cannot confirm if the NUC options and features will work as expected.
At this point, I can only recommend you check with the Linux communities or the distributor for further information.
Please keep in mind that this thread will no longer be monitored by Intel.
Regards,
Deivid A.
Intel Customer Support Technician
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Humm I am a bit disappointed that you elude the original question on the pretext of an unsupported OS.
My original question was indeed OS agnostic and I will rephrase it like this:
How on intel i7-8650U platform one can distinguish a power failure reboot than an iTCO HW watchdog reboot from the CPU/Chipset registers or subsystem ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No avoidance here; this is simply not a question that can be answered.
After a reset, it is the BIOS that gains control and it is the BIOS that would need to look at whether the reset was specifically caused by the Watchdog Timer and save this information (presumably in the BIOS Event Log) before it proceeds to reinitialize the hardware and obliterate this condition. AFAIK, this particular capability isn't available in the UEFI spec.
At the same time, understand that Intel doesn't own the BIOS and its implementation and capabilities is the purview of the platform manufacturer. This means a level of inconsistency, based upon platform source, will always exist. If you really want an industry-wide capability like this, you need to get it into the UEFI specification - which Intel no longer owns as it became an industry spec. Good luck with this endeavor - which will take years - and will, speaking realistically, only be supported in new systems.
Sorry, reality bites,
...S
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @n_scott_pearson and thanks for elaborate reply.
Based on your explanation IIUC understand the "boot reason" info is handled in the UEFI/BIOS which implementation is manufacturer dependent.
I will have a look at (daunting) UEFI spec to see it they define such feature, and if by chance it is exposed by the manufacturer.
Given that I am working on an Intel NUC7i7DNB NUC whose BIOS is distributed and (I assume ?) made by Intel, who do I have to ask for support regarding this matter ?
Thanks
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page