Using streaming stores

Intel_C_Intel · ‎07-18-2007

Hello all,

I read in the "Software vectorization handbook" that a "streaming" approach can beat a "vectorization" approach (p.175). Unfortunately, I am not quite sure how to implement that technique. The compiler help mentions /Qopt-streaming-stores option, but it does not seem to do much for me. Is this done by default, or do I need to give more info to the compiler (a pragma maybe?)?

Also, how do I know that streaming stores are being used? Does the compiler output indicate that?

Thanks in advance.

Alex

TimP · ‎07-18-2007

Did you consult the compiler documentation on this option? I don't have that book handy, but I don't see how the streaming store option can be distinguished from vectorization. According to the documentation, you must complete the command line option with a mode e.g. /Wopt-streaming-stores:auto in which case the compiler presumably would use non-temporal stores only for vectorizable code where there is no immediate visible re-use of the data.
My experience is with the pragma version
#pragma vector nontemporal
which is most useful for a loop which does nothing but set values in one or more large arrays (large compared with the size of L2 cache). This avoids evicting the contents of L2 and replacing them with the last part of the newly initialized array.