Intel® Optane™ Persistent Memory
46 Discussions

Two questions about Intel Optane App Direct Mode

Lab8010
Beginner
1,141 Views

I have two questions about Intel Optane Persistent memory modes.

 

1. How should we set data availability for data on the memory?

I guess if the Intel Optane persistent memory goes down/failed, the data will be lost, right?

But for enterprise infrastructure, we should prepare redundancy. For this situation, how and which layer should have the rerundancy feature? (does the optane have native redundancy feature? or application layer, OS layer, which levels feature is needed?)

 

When I search the information on Intel, I could not find the native feature so I guess users need to set redundancy features on OS or app levels.

 

2. When we reseat the Intel Optane memory from original slot to another slot on same motherboard, is the data on the memory kept, right?

For x86 server troubleshooting, sometimes we try to change physical memory slot for memory modules, so I would like to confirm this situation.

0 Kudos
1 Solution
SteveScargall
Employee
1,111 Views

Answer 1

My TL;DR response is this is an application responsibility to provide data redundancy. There's no one configuration rule/option that fits all scenarios. RAS (Reliability, Accessibility, Serviceability) is a big topic for which I'm more than happy to talk about in-depth for hours - or days

 

I guess if the Intel Optane persistent memory goes down/failed, the data will be lost, right?


Correct. If a single module in a PMem Region permanently fails, then all the data within the region is lost. This is why we take regular backups, right?

With the SMART Health data available with PMem, you should get a warning of a failure well before it happens. The endurance of PMem is orders of magnitudes higher than NAND, so we expect the PMem to last as long as the system is in service (~5yrs). However, hardware is hardware, and there is a small possibility (<=0.44 AFR) that a PMem module will fail. See Pg 7 of the Product Brief for more endurance info. 

 


How should we set data availability for data on the memory?


It all depends on the application(s). I can think of at least eight different methods for how applications can consume PMem when the platform is in AppDirect mode, plus we can provision AppDirect in an Interleaved (default) or Non-Interleaved configuration depending on the needs of the application(s). Understanding how an application consumes PMem will determine the data redundancy solution.

There is no hardware mirroring or complex RAID support for PMem within the Integrated Memory Controller, so HW solutions are not an option - either within the socket or across sockets. Note, VROC (Virtual RAID on Chip) does not work with PMem, it's intended for NVMe and NAND SSDs only. Depending on the application requirements, using Software RAID (mdadm, LVM, etc) is possible, but you introduce significant latencies, so I recommend avoiding this approach if possible. It's all too easy to get closer to SSD performance when you misconfigure (or misunderstand) PMem. We wrote Storage Redundancy with Intel® Optane™ Persistent Memory Modules with this in mind as it's a frequently asked question. 

If the application has high availability features available, use those over SW RAID. 

If we then introduce Virtualization (Virtual Machines or Containers) to our discussion, we add additional vectors that need to be thought about with regards to data redundancy. There are answers and solutions to almost all of the possible combinations, but there's no one answer/solution to them all. 

Apologies for a semi-vague response, but I would need to know specifics before I could provide more clarity. 

 

View solution in original post

8 Replies
AdrianM_Intel
Moderator
1,125 Views

Hello Lab8010,


Thank you for posting on the Intel® communities.  


Please allow us some time to review into your questions and get back to you with further details.


Regards,


Adrian M.

Intel Customer Support Technician


SteveScargall
Employee
1,117 Views

Hi Lab8010,

 

Answer #2)

Unlike DRAM, PMem has a wealth of Health data that can report which module(s) have issues and what the issue is. You should have no trouble identifying a failed/failed PMem module using the `ipmctl` utility:

  • ipmctl show -dimm
  • ipmctl show -a -dimm <DIMM_ID>

 

I upstreamed a `pmem` module to the SOSReport utility some time ago, so if you run a Linux Distro that SOSReport works with, generate a report and send this to your support vendor to help them. 

To answer the shuffling PMem modules to troubleshoot part of the question, you should check with your server vendor as the answer will be different between the OEMs and perhaps different BIOS versions from the same OEM. 

In almost all situations, you want to maintain the same population before and after the move. Servers have population rules you need to follow regarding the correct order of population (which slots to populate first). You won't destroy data by moving PMem to different slots, but you may encounter BIOS/POST errors that prevent the host from booting if the PMem population rules are not met. 

When we provision PMem in Memory Mode or AppDirect, a small amount of metadata (configuration information) is written to the Platform Configuration Attributes Table (PCAT) on each PMem module. The UEFI will read the PCAT and program the integrated Memory Controller at boot time. BIOS's may implement strict or loose adherence to the population rules for whether the original layout must be maintained. 

 

SteveScargall
Employee
1,112 Views

Answer 1

My TL;DR response is this is an application responsibility to provide data redundancy. There's no one configuration rule/option that fits all scenarios. RAS (Reliability, Accessibility, Serviceability) is a big topic for which I'm more than happy to talk about in-depth for hours - or days

 

I guess if the Intel Optane persistent memory goes down/failed, the data will be lost, right?


Correct. If a single module in a PMem Region permanently fails, then all the data within the region is lost. This is why we take regular backups, right?

With the SMART Health data available with PMem, you should get a warning of a failure well before it happens. The endurance of PMem is orders of magnitudes higher than NAND, so we expect the PMem to last as long as the system is in service (~5yrs). However, hardware is hardware, and there is a small possibility (<=0.44 AFR) that a PMem module will fail. See Pg 7 of the Product Brief for more endurance info. 

 


How should we set data availability for data on the memory?


It all depends on the application(s). I can think of at least eight different methods for how applications can consume PMem when the platform is in AppDirect mode, plus we can provision AppDirect in an Interleaved (default) or Non-Interleaved configuration depending on the needs of the application(s). Understanding how an application consumes PMem will determine the data redundancy solution.

There is no hardware mirroring or complex RAID support for PMem within the Integrated Memory Controller, so HW solutions are not an option - either within the socket or across sockets. Note, VROC (Virtual RAID on Chip) does not work with PMem, it's intended for NVMe and NAND SSDs only. Depending on the application requirements, using Software RAID (mdadm, LVM, etc) is possible, but you introduce significant latencies, so I recommend avoiding this approach if possible. It's all too easy to get closer to SSD performance when you misconfigure (or misunderstand) PMem. We wrote Storage Redundancy with Intel® Optane™ Persistent Memory Modules with this in mind as it's a frequently asked question. 

If the application has high availability features available, use those over SW RAID. 

If we then introduce Virtualization (Virtual Machines or Containers) to our discussion, we add additional vectors that need to be thought about with regards to data redundancy. There are answers and solutions to almost all of the possible combinations, but there's no one answer/solution to them all. 

Apologies for a semi-vague response, but I would need to know specifics before I could provide more clarity. 

 

ta94kun
Beginner
1,100 Views

I have the same question as Labo 8010.

For example, what about the following cases?
For example, I am using 4 Optane DC Persistent Memory.
I am using it in the interleave setting in APP Direct Mode.
The motherboard needs to be replaced.
Are there any precautions to take when retaining data after replacement?

I understand that the Optane DC Persistent Memory uses the manufacturer's designated slots for operation. What about the following?

(1) All Optane DC Persistent Memory must be returned to the same slot. If the Slots are different, the data in the Region will be corrupted. (Y / n)

(2) When replacing the motherboard, it is necessary to reconfigure the default settings of the motherboard. Can Optane DC Persistent Memory lose data when setting up the BIOS? (Y / n)

SteveScargall
Employee
1,076 Views

@ta94kun wrote:

I have the same question as Labo 8010.

For example, what about the following cases?
For example, I am using 4 Optane DC Persistent Memory.
I am using it in the interleave setting in APP Direct Mode.
The motherboard needs to be replaced.
Are there any precautions to take when retaining data after replacement?

Make sure you have a backup of your data prior to any HW replacement. After replacing the MB, reinstall the DRAM and PMem into their original slots. Refer to the documentation for your server as many OEMs have procedures you need to follow. 

 

(1) All Optane DC Persistent Memory must be returned to the same slot. If the Slots are different, the data in the Region will be corrupted. (Y / n)


I can only think this scenario could occur if you're moving PMem from one physical system to another that has a different layout. In which case, follow the population rules for the new host. It's dependent on the new host's BIOS whether this will work or not. There's nothing on the PMem that pins it to a specific host or hostid. Note that you can only move PMem between systems of the same CPU generation, ie: you cannot install Optane Series 100 into an Ice Lake host, or Optane Series 200 into a Cascade Lake host. 

You can see the tables using the`ipmctl show system (NFIT|PCAT|PMTT)` as described on https://docs.pmem.io/ipmctl-user-guide/debug/show-acpi-tablesTo learn more about how ipmctl works with the hardware see the Intel® Optane™ Persistent Memory OS Provisioning Specification, which describes all the firmware interface commands used for this operation. There's a wealth of detail in the UEFI and ACPI specification documents that can be found on https://uefi.org/specifications

 


(2) When replacing the motherboard, it is necessary to reconfigure the default settings of the motherboard. Can Optane DC Persistent Memory lose data when setting up the BIOS? (Y / n)


You will need to manually restore any previously changed BIOS values. Check the default value of the PMem operating Mode (App Direct or Memory Mode) and do not re-create the goal. If you create a new goal, the data will be lost. 

 

 

ta94kun
Beginner
1,054 Views

Hi SteveScargall

Thank you very much. Your answer has solved many questions.

AdrianM_Intel
Moderator
1,045 Views

Hello Lab8010,


Were you able to check the previous posts? 


Let us know if you need more assistance. 


Regards,


Adrian M.

Intel Customer Support Technician


AdrianM_Intel
Moderator
996 Views

Hello Lab8010,


We have not heard back from you, so we will close this inquiry. If you need further assistance, please post a new question. 


Regards,


Adrian M.

Intel Customer Support Technician


Reply