Server Products
Data Center Products including boards, integrated systems, Intel® Xeon® Processors, RAID Storage, and Intel® Xeon® Processors
4761 Discussions

rs3dc080 Single-bit ECC error critical threshold exceed. Cables fault?

mpetr33
Novice
2,390 Views

Hello,

need a little advice on problem with RS3DC080, It returns me

 

LOCALIZED MESSAGE = Controller ID: 0  Single-bit ECC error;

critical threshold    exceeded:  ECAR      =  701625440 ,

ELOG  =  8396800     ,

( Src: Data Bits lane bitmap=0080, bank bitmap=00, elog 802000)

 

It works together with supermicro backplane BPN-SAS-825TQ(is in THOL list) with drives 0F23021/HGST ( HUS726060ALE614 6TB )

Firmware on raid card is most recent(Flash package = 24.21.0-0091),

 

Is it correct to start seeking problem in connections(cables) or you have experience that the drives/backplane or controller can cause such problems?

How probable is that the controller itself is faulty?

Warranty will be done over our local distributor, however I want investigate if this is warranty question rather than the system integrator fault.

Patrol read is ok for all drives.

 

There is exactly same server built same time with same hardware, so it does not look like incompatibility question.

 

0 Kudos
1 Solution
mpetr33
Novice
2,050 Views

Just for information.

On site i changed cables, the error remained after a few hours.

Then I replaced the raid controller to new one, and imported foreign configuration. Few weeks works fine.

View solution in original post

0 Kudos
7 Replies
JoseH_Intel
Moderator
2,050 Views

Hello mpetr33,

 

Thank you for joining the community

 

Could you tell where are you seeing these error from? Is this a Intel server board or a Supermicro one?

I could suggest you to run the RAID Web Console 3 and/or the StorCLI tool to get a full readable log that we can check.

 

About if cables could be causing this issue its plausible but usually not common

 

Regards

 

Jose A.

Intel Customer Support Technician

A Contingent Worker at Intel

0 Kudos
mpetr33
Novice
2,050 Views

rwc2, however good point to update to rwc3, i will do that and get back here with results.

0 Kudos
mpetr33
Novice
2,050 Views

this is a Intel server board s1200spsr with rs3dc080 in pcie x8 slot.

 

Installed rwc3 and inlcude a log from it that is after server restart. What interests me is this one

{

         "eventId" : 202,

         "sequenceNumber" : 8619,

         "time" : "2019-10-9T14:34:11",

         "description" : "Controller ID: 0 Single-bit ECC error; critical threshold exceeded: ECAR: 7.01625e+008 ELOG: 8.3968e+006 (Src: Data Bits lane bitmap=0080, bank bitmap=00, elog 802000)"

      }

0 Kudos
JoseH_Intel
Moderator
2,050 Views

Hello mpetr33,

 

Thanks for the updates. These ECC errors are repeating a couple times. Looks like they might be originated in the actual RAID controller memory used for cache. It is possible the RAID controller will eventually fail caused by it memory been faulty. The cables that you suspect are difficult to be the cause of these error though.

 

I would suggest to wait for the warranty replacement to arrive and rerun diags to confirm these ECC errors are gone.

 

Regards

 

Jose A.

Intel Customer Support Technician

A Contingent Worker at Intel

 

JoseH_Intel
Moderator
2,050 Views

Hello mpetr33,

 

Do you have any further details, updates, questions or comments in regards to this issue?

This thread will be marked as resolved automatically in the next 72 hours if no activity is received.

 

Regards

 

Jose A.

Intel Customer Support Technician

A Contingent Worker at Intel

0 Kudos
JoseH_Intel
Moderator
2,050 Views

Hello mpetr33,

 

We will proceed to mark this thread as resolved. If you have further issues or questions just go ahead and create a new topic.

 

Regards

 

Jose A.

Intel Customer Support Technician

A Contingent Worker at Intel

0 Kudos
mpetr33
Novice
2,051 Views

Just for information.

On site i changed cables, the error remained after a few hours.

Then I replaced the raid controller to new one, and imported foreign configuration. Few weeks works fine.

0 Kudos
Reply