Solved: Re:Is it possible that accessing pmem(app direct m...

hamj · ‎07-23-2020

Hello,

I have a question about Optane DC Persistent Memory(DCPM).

As I know is right, the only way that accessing DCPM in user space is using PMDK library, such as libpmem.

However, in this case, PMDK should needs mounting DCPM with file system, and it works like block device. Of course I understand that this is for general utilization for all linux applications. But still I want to using DCPM without disk formatting.

I had already tried using DCPM directly without disk formatting by using start virt_addr of DCPM in kernel. And it works fine. I wonder is there any other way to use DCPM in user spacesuch as mmap(/dev/pmem0), not in kernel, and not using disk formatting.

Thank you

Emeth_O_Intel · ‎07-28-2020

Hello hamj,

Thank you for replying back.

If you use the PMEM in devdax mode, you can memory map such a device, it is byte addressable, and it does not require a file system.

The primary benefit of having a file system is that it provide secure, multi-tenancy access to the persistent memory.

Devdax mode security is simplistic (based on access to the /dev/dax* device) and doesn’t provide any support for multi-tenancy (in theory you could build your own, but then you’d end up with something that looks a lot like a dax-capable file system).

You could be able to modify the allocation library (libvmmalloc) to work with the devdax device (in addition to the fsdax access it normally supports) but I have not seen to modify the PMDK beyond that.

Note that unlike fsdax mode, where the file system plays a role in determining the page sizes used (4KB or 2MB) devdax mode uses the alignment characteristic of the PMEM itself (4KB, 2MB, or 1GB). In despite of, the paucity of 1GB TLB entries, 1GB alignment for devdax PMEM yields the best performance across a range of workloads (never slower, sometimes as much as 68% better depending upon the workload). Of course, for multi-tenancy environments 1GB may not make sense (which would explain why the dax file systems don’t support it).

As a conclusion, you can just mmap it but be mindful of the limitationslisted above.

Most PMDK libraries can simply use devdax for the pool. libpmem (and libpmem2 once released) also works with devdax.

Have a wonderful day.

Regards,

Emeth O.

Intel Server Specialist.

View solution in original post

Emeth_O_Intel · ‎07-25-2020

Hello hamj,

Thank you for contacting Intel DCPMM Community.

The operating systems provide direct access to persistent memory and PMDK builds on that. If you call mmap() directly, you don't need to use PMDK. PMDK's job is to make persistent memory programming easier, but it is not a requirement to use it. PMDK makes the programming *much* easier, though, and I see lots of incorrect code related to people doing things themselves. The most common errors I see on Linux are forgetting to use the MAP_SYNC flag to mmap(), ignoring the Dirty Shutdown Count, and failing to check for persistent CPU caches to avoid cache flushes. PMDK does these things for you, but again if you're willing to take on these responsibilities yourself, PMDK is not required.

On Linux, there are two ways to get direct access (DAX). The most common way is to use a pmem-aware file system like ext4 or XFS, mount it with the "-o dax" option, and then mmap a file. If you don't want a file system for some reason, you can configure the namespace for devdax instead of fsdax. See the man page for ndctl for details. Using devdax provides a much more "raw" interface. You have to be root to open the device (or chown the device). Things like stat() cannot be used to determine the size of the namespace. Calling msync() will not flush anything. My point is the raw access is there, but it is tricky to use correctly. PMDK works on top of both fsdax and devdax and abstracts away the differences. Beware that fsdax will zero all allocated blocks for you, preventing an application from seeing old data from other applications. devdax won't, so you take on the security responsibility as well.

A common mistake is for people to access /dev/pmemX directly and think they are getting DAX. You need a DAX-capable file system for that to work. If you access the device directly like that, you are using the page cache and not getting direct access. Only by changing to devdax mode and using device names like /dev/daxX can you map the device and get DAX.

Additionally, I would like to verify if you have tried to use Libmemkind?

According to the current persistent memory programming model, NVDIMMs are exposed by the operating system as devices on which the user should create file-system. This model creates a need for a way to consume memory exposed through files in applications. Libmemkind fills this gap by utilizing jemalloc on temporary files created on File System DAX created on NVDIMMs and acts as a memory allocator for applications. Libmemkind provides various memory pools called “kinds” for memories with miscellaneous characteristics: DRAM, persistent memory and High-Bandwidth Memory.

This allows partitioning of the heap of an application between these kinds. On a system equipped with DRAM and NVDIMMs, it is possible to modify an application in a way to store objects that are accessed frequently and require fast access in DRAM while larger objects which are accessed less frequently can be stored on persistent memory.

Please check the following links for more details:

>_ https://pmem.io/2020/01/20/libmemkind.html
>_ http://memkind.github.io/memkind/#blog

Please let me know if the information provided helps you to clarify your concerns.

Regards,

Emeth O.
Intel Server Specialist.

hamj · ‎07-26-2020

Hello, Emeth!

Following your comment, I tried to using DCPM with devdax in ndctl.

Because what I wonder is "Can I use DCPM without file system?", I appreciate your reply.

So I think this is my last question: If I configure my DCPM as devdax in ndctl, what is the proper way to access it? After configuring DCPM to devdax, I can see there is /dev/dax0.0 for DCPM.

Should I use mmap like accessing /dev/mem after I open it? or is there more proper way? Using PMDK does not looks like proper way since it requires mount address.

Thank you

Emeth_O_Intel · ‎07-28-2020

Hello hamj,

Thank you for replying back.

If you use the PMEM in devdax mode, you can memory map such a device, it is byte addressable, and it does not require a file system.

The primary benefit of having a file system is that it provide secure, multi-tenancy access to the persistent memory.

Devdax mode security is simplistic (based on access to the /dev/dax* device) and doesn’t provide any support for multi-tenancy (in theory you could build your own, but then you’d end up with something that looks a lot like a dax-capable file system).

You could be able to modify the allocation library (libvmmalloc) to work with the devdax device (in addition to the fsdax access it normally supports) but I have not seen to modify the PMDK beyond that.

Note that unlike fsdax mode, where the file system plays a role in determining the page sizes used (4KB or 2MB) devdax mode uses the alignment characteristic of the PMEM itself (4KB, 2MB, or 1GB). In despite of, the paucity of 1GB TLB entries, 1GB alignment for devdax PMEM yields the best performance across a range of workloads (never slower, sometimes as much as 68% better depending upon the workload). Of course, for multi-tenancy environments 1GB may not make sense (which would explain why the dax file systems don’t support it).

As a conclusion, you can just mmap it but be mindful of the limitationslisted above.

Most PMDK libraries can simply use devdax for the pool. libpmem (and libpmem2 once released) also works with devdax.

Have a wonderful day.

Regards,

Emeth O.

Intel Server Specialist.

hamj · ‎07-28-2020

Hello Emeth!

Thank you for your answer.

By your advices, I found my answer for my question.

I configured ndctl to devdax, and map the device using open() and mmap() functions.

Again, thank you for your kindness.

Have a nice day!

Emeth_O_Intel · ‎07-29-2020

Hello hamj,

Excellent, I am very glad to see the information provided helps you to find the correct configuration.

If you have more questions feel free to contact us back and we will be more than happy to assist you.

Have a wonderful day!

Regards,

Emeth O.

Intel Server Specialist.

Is it possible that accessing pmem(app direct mode) in user space without formatting disk?