Community
cancel
Showing results for 
Search instead for 
Did you mean: 
VV0001
Beginner
806 Views

We seen following error with intel xeon microprocessor GG8067402569400SR2DK in our product. Error message- Generic Cache level-2 Generic error and also Processor context corrupt for Bank 17 and Bank 19

mcelog: Family 6 Model 56 CPU: only decoding architectural errors Hardware event. This is not a software error.

CPU 4 BANK 17 TSC 5d6953ae81a

RIP !INEXACT! 10:ffffffff8139831a

MISC 4fc389603402086 ADDR fa002000

TIME 1559870123 Fri Jun 7 03:15:23 2019

MCG status:RIPV MCIP

MCi status:

Uncorrected error

Error enabled

MCi_MISC register valid

MCi_ADDR register valid

Processor context corrupt

MCA: corrected filtering (some unreported errors in same region)

Generic CACHE Level-2 Generic Error

STATUS be200000000c110a MCGSTATUS 5

CPUID Vendor Intel Family 6 Model 86

RIP: intel_idle+0xda/0x160}

SOCKET 0 APIC 1 microcode 7000005

0 Kudos
5 Replies
CarlosAM_INTEL
Moderator
317 Views

Hello, @VV0001​:

 

Thank you for contacting Intel Embedded Community.

 

Could you please let us know how many units of the project related to this circumstance have been manufactured? How many are affected? Could you please give the failure rate?

 

Could you please provide topside markings pictures of the processors associated with this issue?

 

We are waiting for your answer.

 

Best regards,

@Mæcenas_INTEL​.

 

VV0001
Beginner
317 Views

Hi Team,

 

Thanks for your kind response. Please find our answers in line below.

 

How many are affected? one unit is affected.

 

Could you please provide topside markings pictures of the processors associated with this issue? We have requested with the concerned team, will provide you this at the earliest.

 

In addition to our above questions, we have few more queries below.

 

  1. What does MCE error (kernel panic) mean?
  2. Whether the MCE log decoding mechanism used by us are correct or not??
  3. Whether above MCE log decodes to error: Generic Cache level-2 Generic error and also Processor context corrupt for Bank 17 and Bank 19??
  4. Let us know what is the cause of MCE from the decoded MCE log. Whether is it a Hardware failure (CPU internal itself) or Software failure which handling some function??
  5. What does it mean by Generic CACHE Level-2? Whether Cache memory Internal to the CPU?

Please let us know from the above decoded MCE log whether in future it will affect health of the board as node seems to be working fine now.

 

CarlosAM_INTEL
Moderator
317 Views

Hello, @VV0001​:

 

Thanks for your reply.

 

The information that may answer your questions as a reference can be found at:

 

https://access.redhat.com/solutions/18723

https://bugzilla.redhat.com/show_bug.cgi?id=1085785

 

In case that you want more details of the description of the reported situation, please address your questions as a reference to the channels listed at the following websites:

 

https://bugzilla.redhat.com/page.cgi?id=redhat/contact.html

https://www.redhat.com/en/services/consulting?extIdCarryOver=true&sc_cid=701f2000001OH7JAAW#GatedFor...

 

On the other hand, we are waiting for the requested pictures to give you some hardware recommendations.

 

Best regards,

@Mæcenas_INTEL​.

VV0001
Beginner
317 Views

Hi Team,

 

As requested earlier, please find the attached the top side picture of the processor.

 

Also the links provided above which doesn't contain any solutions, only match with our error prints.

Please provide us the solution for error reported.

 

CarlosAM_INTEL
Moderator
317 Views

Hello, @VV0001​:

 

Thanks for your update.

 

Based on your previous communications, could you please swap one unaffected processor with the affected and let us know the results of this change?

 

We are waiting for your reply.

 

Best regards,

@Mæcenas_INTEL​.

Reply