Intel® Optane™ Persistent Memory
46 Discussions

how to use non-temporal (streaming) store instructions to store/load a self-defined struct?

huangwentao
New Contributor I
1,268 Views
  • 0

    I just start to use non-temporal store instructions to store some kinds of data to the memory (could be DRAM or NVM). I check out the Intel Intrinsics Guide for such storing functions and I find functions like _mm_stream_si32, _mm_stream_si18, _mm_stream_si256 etc. It seems that these kinds of functions can only be applied to some kinds of integers. My question is that if I self-define a certain type of struct and its size may be 1KB, 2KB ...... How can I perform non-temporal (streaming) stores to store such kinds of structs to my memory (or vice versa, load from memory). For now, I can only figure out one way, to cast my struct into a chunk of integers, and apply non-temporal/streaming store/load for each of the casted integers one-by-one. I think this method is somewhat inefficient, is there a more efficient way of coding to achieve my goal?

    Also, if I want to store a large number of such self-defined struct, is it necessary to issue a sfence after every non-temporal store? I am not sure about that and wonder that if I could remove the sfence instruction or just issue one sfence instruction after performing all non-temporal stores?

    Moreover, I found that the number of non-temporal streaming (load) functions is very limited. I only found one function, _mm_stream_load_si128, are there any other functions for loading?

    Many thanks for the help

0 Kudos
1 Solution
AdrianM_Intel
Moderator
1,215 Views

Hello huangwentao,

 

Thank you for your response.

 

The implementation found in libpmem/libpmem2 performs all the necessary operations to make sure everything works. If you would like to clarify more details about this you can create a question in the following link: Issues · pmem/pmdk · GitHub.

 

We will highly recommend getting familiar with the relevant parts of the manual linked above in case you want to use the intrinsics manually.

 

It's worth mentioning that you will need to use appropriate fencing operations with non-temporal stores. This is explained in more detail in section 9.4.1.1. (Please check the document attached).

 

"9.4.1.1 - Fencing

Because streaming stores are weakly ordered, a fencing operation is required to ensure that the stored data is flushed from the processor to memory. Failure to use an appropriate fence may result in data being “trapped” within the processor and will prevent visibility of this data by other processors or system agents. WC stores require software to ensure coherence of data by performing the fencing operation. See Section 9.4.5, “FENCE Instructions."

 

Regards,

 

Adrian M.

Intel Customer Support Technician

 

View solution in original post

10 Replies
IntelSupport
Community Manager
1,248 Views

Hello huangwentao,


Thank you for posting your question on this Intel® Community.


To better assist you, we would like to have additional information about your environment. Please provide the following details:


  • The model, or serial number, of the Intel® Optane™ Persistent Memory.
  • The system model where this component is installed.
  • Are you developing an application?
  • What is the firmware running on this component?


Wanner G.

Intel Customer Support Technician


huangwentao
New Contributor I
1,246 Views

Hi Wanner, 

 

Thank you for your reply.

Unfortunately, the Optane DCPMM module is installed in a remote server, so I may not be able to find out the model or serial number.

The running system is Ubuntu-20.04, the linux kernel version is 5.8.0-43-generic.

I am actually doing some research-oriented experiments with C/C++ programming on Optane DCPMM, so I think it has nothing to do with the firmware.

Many thanks for the help and looking forward to your reply.

 

Wentao

IntelSupport
Community Manager
1,245 Views

Hello huangwentao,


We appreciate your response.


Could you please let us know if the Intel® Optane™ Persistent Memory Module is installed in an Intel® Server?


Wanner G.

Intel Customer Support Technician


huangwentao
New Contributor I
1,241 Views

I think it is not an Intel server. But it is equipped with an Intel(R) Xeon(R) Gold 5222 CPU @ 3.80GHz CPU.

IntelSupport
Community Manager
1,237 Views

Hello huangwentao,


Thank you for your response.


Please allow me to review the information provided. I will update this thread soon.


Wanner G.

Intel Customer Support Technician


AdrianM_Intel
Moderator
1,231 Views

Hello huangwentao,


Thank you for your patience.


After investigating, we would recommend against rolling your own low-level mechanisms like that. Instead, you can leverage nontemporal memcpy implemented in libpmem/libpmem2.

 

Here's an example that shows its usage: https://github.com/pmem/pmdk/blob/master/src/examples/libpmem2/ringbuf/ringbuf.c#L289

 

In this way, you don't have to deal with all the details listed.

 

Alternatively, you can find a lot of useful information in the x86 Architectures Optimization Manual on sections 9.6 and 15.16: https://software.intel.com/content/www/us/en/develop/download/intel-64-and-ia-32-architectures-optim...


Regards,


Adrian M.

Intel Customer Support Technician




huangwentao
New Contributor I
1,227 Views

Thanks Adrian.

May I ask if I should issue a `fence` instruction whenever I perform one such non-temporal `memcpy` (or non-temporal functions like, _mm_stream_si32, _mm_stream_load_si128, etc )?

AdrianM_Intel
Moderator
1,216 Views

Hello huangwentao,

 

Thank you for your response.

 

The implementation found in libpmem/libpmem2 performs all the necessary operations to make sure everything works. If you would like to clarify more details about this you can create a question in the following link: Issues · pmem/pmdk · GitHub.

 

We will highly recommend getting familiar with the relevant parts of the manual linked above in case you want to use the intrinsics manually.

 

It's worth mentioning that you will need to use appropriate fencing operations with non-temporal stores. This is explained in more detail in section 9.4.1.1. (Please check the document attached).

 

"9.4.1.1 - Fencing

Because streaming stores are weakly ordered, a fencing operation is required to ensure that the stored data is flushed from the processor to memory. Failure to use an appropriate fence may result in data being “trapped” within the processor and will prevent visibility of this data by other processors or system agents. WC stores require software to ensure coherence of data by performing the fencing operation. See Section 9.4.5, “FENCE Instructions."

 

Regards,

 

Adrian M.

Intel Customer Support Technician

 

AdrianM_Intel
Moderator
1,188 Views

Hello huangwentao,


Were you able to check the previous post? 


Let me know if you need more assistance. 


Regards,


Adrian M.

Intel Customer Support Technician




huangwentao
New Contributor I
1,184 Views
Reply