Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Boomerang
Beginner
1,362 Views

Intel Xeon Gold ADDDC Memory RAS Feature Clarification

Jump to solution

I'm struggling to find any detailed information about how ADDDC memory RAS technique actually works. I'm specifically interested in the memory integrity guarantees that it provides.

 

Would ADDDC be able to cope with a DIMM device failing completely, without throwing correctable errors first (e.g. if it's snapped off the memory card)? Alternatively, maybe ADDDC acts like device sparing, followed by SDDC? In that case it cannot deal with a sudden DIMM device failure.

 

I struggle to understand how ADDDC would be able to achieve the same result as DDDC without any performance impact unless it lowers the protection guarantees...

 

Thanks in advance for any clarifications!

Labels (1)
0 Kudos
1 Solution
IntelSupport
Community Manager
1,267 Views

Hello Boomerang


Thank you for waiting. Please see the information below:


Q1: Would it be correct to describe ADDDC as memory sparing followed by SDDC?


A1: When we transition from SDDC to ADDDC, a memory bank/rank gets mapped out, and the memory region that entered Virtual lockstep will be using ADDDC ECC code.


Q2: Since the memory operates in Performance mode, the words are not split between two channels, so not all errors within a single DIMM chip can be corrected through ECC. However, once a certain threshold of errors is reached, the memory layout changes to Lockstep mode and becomes redundant. Is that true?

A2: Yes


Q3: I've read the [whitepaper], but I can interpret it in two different ways, so it'd be great to hear it explained differently... The line that's giving me trouble is "where the identified failing region of the DRAM device is mapped out of ECC". How can the failing region be mapped out of ECC if the memory is in Performance mode?

A3: ADDDC enables the platform to dynamically map out the failing DRAM device. After map out occurs, cache lines in the bank/rank are re-arranged from independent mode to virtual lockstep utilizing ADDDC ECC.


Hope this helps.


Regards,

Leonardo C.


Intel Customer Support Technician


View solution in original post

9 Replies
SergioS_Intel
Moderator
1,349 Views

Hello Boomerang,


Thank you for contacting Intel Customer Support.

 

In regards to your question, with the advent of ADDDC, the memory subsystem is always configured to operate in performance mode. When the number of corrections on a DRAM device reaches the targeted threshold value, with help from the UEFI runtime code, the identified failing DRAM region is adaptively placed in lockstep mode where the identified failing region of the DRAM device is mapped out of ECC. Once in ADDDC, cache line ECC continues to cover single DRAM (x4) error detection and apply a correction algorithm to the nibble.


You will be able to find more detailed information here:


https://software.intel.com/content/www/us/en/develop/articles/new-reliability-availability-and-servi...



Best regards,

Sergio S.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit :https://intel.com/support/serverbios


Boomerang
Beginner
1,345 Views

Hello Sergio,

 

Thank you for the reply.

In that case would it be correct to describe ADDDC as memory sparing followed by SDDC?

Since the memory operates in Performance mode, the words are not split between two channels, so not all errors within a single DIMM chip can be corrected through ECC. However, once a certain threshold of errors is reached, the memory layout changes to Lockstep mode and becomes redundant. Is that true?

I've read the document you linked, but I can interpret it in two different ways, so it'd be great to hear it explained differently... The line that's giving me trouble is "where the identified failing region of the DRAM device is mapped out of ECC". How can the failing region be mapped out of ECC if the memory is in Performance mode?

SergioS_Intel
Moderator
1,335 Views

Hello Boomerang,


Please allow us to check on your question and we will get back to you as soon as possible.


Best regards,

Sergio S.

Intel Customer Support Technician



Boomerang
Beginner
1,321 Views

Hello Sergio,

 

Thank you. Looking forward to your reply.

 

Kind regards,

Boomerang

IntelSupport
Community Manager
1,300 Views

Hello Boomerang


I would like to let you know that we are working on the investigation of your forum, thank you for waiting, in the meantime, I have sent you a private email to collect contact information.


Regards,

Leonardo C.


Intel Customer Support Technician


IntelSupport
Community Manager
1,291 Views

Hello Boomerang


I am checking on this community, I would like to know if you received the private email that I send you to collect you contact details


Regards,

Leonardo C.


Intel Customer Support Technician


Boomerang
Beginner
1,273 Views

Hello Leonardo,

 

Just sent all the details.

 

Kind regards,

Boomerang

Tags (1)
IntelSupport
Community Manager
1,268 Views

Hello Boomerang


Thank you for waiting. Please see the information below:


Q1: Would it be correct to describe ADDDC as memory sparing followed by SDDC?


A1: When we transition from SDDC to ADDDC, a memory bank/rank gets mapped out, and the memory region that entered Virtual lockstep will be using ADDDC ECC code.


Q2: Since the memory operates in Performance mode, the words are not split between two channels, so not all errors within a single DIMM chip can be corrected through ECC. However, once a certain threshold of errors is reached, the memory layout changes to Lockstep mode and becomes redundant. Is that true?

A2: Yes


Q3: I've read the [whitepaper], but I can interpret it in two different ways, so it'd be great to hear it explained differently... The line that's giving me trouble is "where the identified failing region of the DRAM device is mapped out of ECC". How can the failing region be mapped out of ECC if the memory is in Performance mode?

A3: ADDDC enables the platform to dynamically map out the failing DRAM device. After map out occurs, cache lines in the bank/rank are re-arranged from independent mode to virtual lockstep utilizing ADDDC ECC.


Hope this helps.


Regards,

Leonardo C.


Intel Customer Support Technician


View solution in original post

Boomerang
Beginner
1,253 Views

Hello Leonardo,

 

Thank you for the response. I had a misunderstanding that SDDC and ADDDC used a different method for error detection and correction (https://www.intel.com/content/dam/doc/application-note/e7500-chipset-mch-x4-single-device-data-corre...) and I see now that this is a document for an old CPU.

 

Kind regards,

Boomerang

Reply