Community
cancel
Showing results for 
Search instead for 
Did you mean: 
XHe6
Beginner
6,910 Views

MCE Error on NUC7CJYH 0043 BIOS Update and CANNOT Revert Back

Jump to solution

I updated NUC7CJYH BIOS from 0027 to 0043 today, got mce errors like what I've seen in the past when I updated to all the version after 0027, so I decided to revert back to 0027 but failed. It says the BIOS version does not match and aborted. I literally cannot revert back to any version before 0043 after this update, so I am currently locked to 0043 now.

I am running arch linux with kernel version 4.17.11-6-ck-silvermont

[ 0.090006] mce: [Hardware Error]: Machine check events logged

[ 0.090009] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: a600000000020408

[ 0.090016] mce: [Hardware Error]: TSC 0 ADDR fef4c9e0

[ 0.090023] mce: [Hardware Error]: PROCESSOR 0:706a1 TIME 1535229970 SOCKET 0 APIC 0 microcode 28

0 Kudos
1 Solution
Chris_V_Intel
Moderator
870 Views

Hey all!

BIOS 0046, which fixes this issue, is posted and live!

https://downloadcenter.intel.com/download/28305/BIOS-Update-JYGLKCPX-86A-?v=t Download BIOS Update [JYGLKCPX.86A]

View solution in original post

39 Replies
LMitc3
Beginner
632 Views

I have the same issue here. The bios won't revert back either.

Despite the message above Archlinux does continue to boot without the GUI. Using NoMachine remote access works perfectly.

After noticing that the nic activity lights were blinking randomly, I realized that the display had an issue and successfully logged in. The problem was with a tv as display. Using a computer monitor works normally.

Mitchell_R_Intel
Employee
632 Views

So when you used a monitor you saw the GUI but not with a TV? What brand and how old is the TV? Did you try turning off HDMI CEC Control in the BIOS? Sometimes with can cause havoc with older TVs.

LMitc3
Beginner
632 Views

Yes, monitor works, TV doesn't work.

Turning off cec control meant nothing worked.

TV are between five and 10 years old.

idata
Community Manager
632 Views

Hello, everyone!

 

 

Allow me to perform a deeper research into this. Once I have an update I will let you know.

 

 

 

Antony S.
Ronny_G_Intel
Moderator
632 Views

Hi azuresong,

There is a newer BIOS release out there that you may want to try: version 44, https://downloadcenter.intel.com/download/28106/BIOS-Update-JYGLKCPX-86A-?product=126135 Download BIOS Update [JYGLKCPX.86A]

Now, on BIOS 43 there is an Updated CPU Microcode (Security Advisory-00115), I dont know if this update may be generating the MCE errors, is it possible that you update your microcode from 4.17.11 to https://www.kernel.org/ 4.18.5? What Linux distribution are you using? Did you get this error right away after the BIOS was updated to BIOS 43?

On the other hand, when trying to downgrade BIOS (which I dont recommend) Did you try with the [F7] process? What process did you try and what file and when do you get the error.

Regards,

Ronny G

XHe6
Beginner
632 Views

Hi rguevara,

I updated to the version 44 and the mce error remain the same for both 4.17.11 and 4.18.5, I still cannot revert back to any previous versions.

[~]$ dmesg | grep -i error

[ 0.053361] mce: [Hardware Error]: Machine check events logged

[ 0.053364] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: a600000000020408

[ 0.053372] mce: [Hardware Error]: TSC 0 ADDR fef4c9e0

[ 0.053378] mce: [Hardware Error]: PROCESSOR 0:706a1 TIME 1535503652 SOCKET 0 APIC 0 microcode 28

[ 0.575717] RAS: Correctable Errors collector initialized.

[ 2.460149] platform regulatory.0: Direct firmware load for regulatory.db failed with error -2

I am running Arch Linux, I am getting mce errors on any version after 0027, so that's the reason I try the new BIOS once it's released and revert back to 0027 if I encountered mce errors.

I used the [F7] process to upgrade/downgrade BIOS, I loaded multiple old BIOS to USB drive and try one by one, I could not downgrade to any previous version.

The error information for downgrading BIOS is Incompatible BIOS version, Update Aborted.

JHo
Beginner
632 Views

Hi Rguevara,

You can only downgrade from JY0044 to JY0043 since BIOS update uCode after JY0042.

JY0044 uCode 0x28

JY0043 uCode 0x28

JY0042 uCode 0x22

BRs,

Josh

Ronny_G_Intel
Moderator
632 Views

Hi,

I just checked on this again, BIOS version 43 includes an updated CPU Microcode (Security Advisory-00115), once BIOS is updated to this version you cannot go back to previous versions due to security reasons.

That tiny part is missing from the documentation and I already request that to be added. It cannot be downgraded through any regular method: BIOS update, [F7] or BIOS recovery jumper.

I apologize for the inconvenience.

Regards,

Ronny G

Ronny_G_Intel
Moderator
632 Views

Hi azuresong,

I am looking into the MCE error that you reported.

See image below, this is what I am getting:

XHe6
Beginner
632 Views

Hi rguevara,

Yes, that's the exact MCE error I am getting, thanks for looking into this.

Ronny_G_Intel
Moderator
632 Views

I would need to read the logs and for that I used to use mcelog but it seems it is not part of kernel 4.15.0 which is the one that I am running and I am reading that mcelog was removed from that kernel version.

I updated from 4.15.0. updated to 4.18.5 still no mcelog and I tried to manually add it with no success.

Do you have a way to read the logs and perhaps post a screenshot?

XHe6
Beginner
632 Views

Hi rguevara,

The mcelog package is deprecated and has been replaced by rasdaemon. You could give it a shot.

I tried to install the rasdaemon to read the mce error information and failed, it seems like I don't have several kernel options enabled for that, such as EDAS related stuff.

Hope rasdaemon could help for you to debug.

Ronny_G_Intel
Moderator
632 Views

Hi azuresong,

Thanks for the information, I guess my Linux knowledge is a bit outdated, I will try rasdaemon however I have never used it before and I dont know if it is going to work out for me.

On the other hand, besides the MCE error message is the system exhibiting any other issue or problem? The system I have is for testing only and I am not really running any task on it so everything I see is normal.

Regards,

 

Ronny G

XHe6
Beginner
632 Views

Hi rguevara,

To be honest, I haven't experienced any obvious system issues or crashes yet besides the mce error from the dmesg log.

I remember someone told me once that MCE error could cause random system reboot or application unstable issues, is that true for this case? I would love to see a fix in the future BIOS update.

Thanks!

fyang11
Beginner
632 Views

i have the same problem! i replaced mem,upgraded BIOS,but fail to solve

Ronny_G_Intel
Moderator
632 Views

Hi everybody,

I am really going to need your help to debug this issue.

I installed "rasdeamon" $ sudo apt install rasdaemon

I have it installed but when running I am getting an error message, see screenshot attached. I am getting a "cant locate a mounted debugfs" error message which I believe I already mounted, see previous commands on screenshot.

I would need that if possible that you help me read the logs and interpret the MCE message which is what I am trying to do to keep up with the investigation on this report and secondly, I need to understand if this error messages is connected to the hardware issues you have had.

Please keep in mind that we provide very limited support for Linux related issues so the more you can help me the better.

Regards,

Ronny G

XHe6
Beginner
632 Views

Hi rguevara,

Thanks for report back the progress.

How did you enable the rasdaemon service? I assume Ubuntu is currently using systemd now, maybe you should try to enable the rasdaemon service with systemctl enable command. Please use this for reference https://wiki.archlinux.org/index.php/Machine-check_exception https://wiki.archlinux.org/index.php/Machine-check_exception

Also, I found rasdaemon github repo, a paragraph in the README should be beneficial, here is the link https://github.com/sujithshankar/rasdaemon GitHub - sujithshankar/rasdaemon: Cloning from http://git.infradead.org/users/mchehab/rasdaemon.git/ , basically you need to rebuild the Ubuntu generic kernel with following options enabled, which I don't have a hope those options are enabled by default. Then mcelog should be recorded in the journald. Hope this will help.

A script is provided under /contrib, in order to test the daemon EDAC

handler. While the daemon is running, just run:

# contrib/edac-fake-inject

The script requires a Kernel compiled with CONFIG_EDAC_DEBUG and a running

EDAC driver.

MCE error handling can use the MCE inject:

http://git.kernel.org/cgit/utils/cpu/mce/mce-inject.git http://git.kernel.org/cgit/utils/cpu/mce/mce-inject.git

For it to work, Kernel mce-inject module should be compiled and loaded.

VMi
Beginner
632 Views

azuresong never reported any hardware issues, but the MCE logs on boot is concerning/annoying. I had a previous NUC that had to be returned for hardware errors and the other MCE logs were very confusing to debug.

Here's a reddit thread of others who are experiencing the same mce errors on boot on all BIOS >037

https://www.reddit.com/r/intelnuc/comments/8ufu1m/nuc7cjyh_gemini_lake_celeron_j4005_owners_please/ https://www.reddit.com/r/intelnuc/comments/8ufu1m/nuc7cjyh_gemini_lake_celeron_j4005_owners_please/

You should be able to compile and install mcelog from source according to https://www.mcelog.org/installation.html https://www.mcelog.org/installation.html and get it running. I haven't been able to see it decode any of the MCE errors I'm seeing into /var/log/mcelog yet though.

VMi
Beginner
632 Views

Also it seems like mcelog is having trouble giving detailed outputs in this case (though I've seen it give proper outputs in the case of real hardware errors I had with a previous board) https://github.com/andikleen/mcelog/issues/70 NUC7PJYH (J5005) - mce: [Hardware Error] · Issue # 70 · andikleen/mcelog · GitHub

Though I can definitively say this issue was introduced in the BIOS update immediately following 037. rguevara, you mentioned that the BIOS cannot be downgraded in any regular method, does that mean there is a non-regular method that can be used to downgrade the BIOS in the meantime?

Ronny_G_Intel
Moderator
261 Views

Hello everyone,

There is a newer BIOS to be released very soon that addresses this issue. See screenshot attached.

In regards to not being able to download the BIOS via the "regular" methods I was referring to any method we have publicly available.

I hope this helps,

Ronny G

Reply