I just start to use non-temporal store instructions to store some kinds of data to the memory (could be DRAM or NVM). I check out the Intel Intrinsics Guide for such storing functions and I find functions like _mm_stream_si32, _mm_stream_si18, _mm_stream_si256 etc. It seems that these kinds of functions can only be applied to some kinds of integers. My question is that if I self-define a certain type of struct and its size may be 1KB, 2KB ...... How can I perform non-temporal (streaming) stores to store such kinds of structs to my memory (or vice versa, load from memory). For now, I can only figure out one way, to cast my struct into a chunk of integers, and apply non-temporal/streaming store/load for each of the casted integers one-by-one. I think this method is somewhat inefficient, is there a more efficient way of coding to achieve my goal?
Also, if I want to store a large number of such self-defined struct, is it necessary to issue a sfence
after every non-temporal store? I am not sure about that and wonder that if I could remove the sfence
instruction or just issue one sfence
instruction after performing all non-temporal stores?
Moreover, I found that the number of non-temporal streaming (load) functions is very limited. I only found one function, _mm_stream_load_si128, are there any other functions for loading?
Many thanks for the help