- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello all,
I read in the "Software vectorization handbook" that a "streaming" approach can beat a "vectorization" approach (p.175). Unfortunately, I am not quite sure how to implement that technique. The compiler help mentions /Qopt-streaming-stores option, but it does not seem to do much for me. Is this done by default, or do I need to give more info to the compiler (a pragma maybe?)?
Also, how do I know that streaming stores are being used? Does the compiler output indicate that?
Thanks in advance.
Alex
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Did you consult the compiler documentation on this option? I don't have that book handy, but I don't see how the streaming store option can be distinguished from vectorization. According to the documentation, you must complete the command line option with a mode e.g. /Wopt-streaming-stores:auto in which case the compiler presumably would use non-temporal stores only for vectorizable code where there is no immediate visible re-use of the data.
My experience is with the pragma version
#pragma vector nontemporal
which is most useful for a loop which does nothing but set values in one or more large arrays (large compared with the size of L2 cache). This avoids evicting the contents of L2 and replacing them with the last part of the newly initialized array.
My experience is with the pragma version
#pragma vector nontemporal
which is most useful for a loop which does nothing but set values in one or more large arrays (large compared with the size of L2 cache). This avoids evicting the contents of L2 and replacing them with the last part of the newly initialized array.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page