- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Below are the logs for MCE event occurred :
=============================================================================================
[ 2882.491953] mce: [Hardware Error]: CPU 4: Machine Check Exception: 5 Bank 19: be200000000c110a
[ 2882.595085] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8139831a> {intel_idle+0xda/0x160}
[ 2882.698427] mce: [Hardware Error]: TSC 5d6953ae81a ADDR fa000000 MISC a4fc389602402086
[ 2882.794587] mce: [Hardware Error]: PROCESSOR 0:50663 TIME 1559870123 SOCKET 0 APIC 1 microcode 7000005
[ 2882.906041] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 2882.987320] mce: [Hardware Error]: CPU 4: Machine Check Exception: 5 Bank 17: be200000000c110a
[ 2883.090448] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8139831a> {intel_idle+0xda/0x160}
[ 2883.193785] mce: [Hardware Error]: TSC 5d6953ae81a ADDR fa002000 MISC 4fc389603402086
[ 2883.288902] mce: [Hardware Error]: PROCESSOR 0:50663 TIME 1559870123 SOCKET 0 APIC 1 microcode 7000005
[ 2883.400356] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 2883.481635] mce: [Hardware Error]: Some CPUs didn't answer in synchronization
[ 2883.567072] mce: [Hardware Error]: Machine check: Invalid
[ 2883.631700] Kernel panic - not syncing: Fatal machine check on current CPU
=====================================================================================================
After decoding MCE log below is the message which shows Generic Cache level-2 Generic error and also Processor context corrupt for Bank 17 and Bank 19.
mcelog: Family 6 Model 56 CPU: only decoding architectural errors Hardware event. This is not a software error.
CPU 4 BANK 17 TSC 5d6953ae81a
RIP !INEXACT! 10:ffffffff8139831a
MISC 4fc389603402086 ADDR fa002000
TIME 1559870123 Fri Jun 7 03:15:23 2019
MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS be200000000c110a MCGSTATUS 5
CPUID Vendor Intel Family 6 Model 86
RIP: intel_idle+0xda/0x160}
SOCKET 0 APIC 1 microcode 7000005
mcelog: Family 6 Model 56 CPU: only decoding architectural errors
Hardware event. This is not a software error.
CPU 4 BANK 19 TSC 5d6953ae81a
RIP !INEXACT! 10:ffffffff8139831a
MISC a4fc389602402086 ADDR fa000000
TIME 1559870123 Fri Jun 7 03:15:23 2019
MCG status:RIPV MCIP
MCi status:
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS be200000000c110a MCGSTATUS 5
CPUID Vendor Intel Family 6 Model 86
RIP: intel_idle+0xda/0x160}
SOCKET 0 APIC 1 microcode 7000005
Please provide the clarifications for below:
What does MCE error (kernel panic) mean?
Whether the MCE log decoding mechanism used by us are correct or not??
Whether above MCE log decodes to error: Generic Cache level-2 Generic error and also Processor context corrupt for Bank 17 and Bank 19??
Let us know what is the cause of MCE from the decoded MCE log. Whether is it a Hardware failure (CPU internal itself) or Software failure which handling some function??
What does it mean by Generic CACHE Level-2? Whether Cache memory Internal to the CPU?
Please let us know from the above decoded MCE log whether in future it will affect health of the board as node seems to be working fine now.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, @VV0001:
Thank you for contacting Intel Embedded Community.
Could you please clarify if this thread is related to the following forum?
We are waiting for your answer.
Best regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, both were the same.
We need to know, whether this single occurrence of issue will leads to any functionality issues in future?
Or these errors can be ignored?
Please confirm.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, @VV0001:
Thanks for your reply.
Could you please tell us the results of our last suggestion (message of the August 2nd, 2019) stated in the cited forum?
We are waiting for your answer.
Best regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have not replaced the CPU, it is a big decision and work. Please let us know is there any alternate solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, @VV0001:
Thanks for your reply.
Reviewing the information provided in the cited forum, you have mentioned that just one unit is affected. could you please confirm this information and let us know how many units have been manufactured?
By the way, could you please review that the affected design has been properly soldered (NO cold joint, poor or non wetting, over heat, solder lack, leads floating, or too much solder)?
We are waiting for your reply.
Best regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Around 800 units manufactured and deployed in field. The problem found in customer place after unit was sold, worked for more than 6 months.
Unit passed all in house production tests.
Hence we do not anticipate NO cold joint, poor or non wetting, over heat, solder lack, leads floating, or too much solder
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, @VV0001:
Thanks for your reply.
We suggest you contact the place of purchase of the affected processor to apply the process stated in section 7.2.5, on pages 58 and 59 of the Intel Quality System Handbook that can be found at:
https://www.intel.com/content/dam/www/public/us/en/documents/reference-guides/quality-system-handbook.pdf
Best regards,
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page