- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to make a data copying operation such as B[1 : n : 1] = A[1 : stride*n : stride], because arrayA has neither temporary reuse norspatial reuse, I don't want array Ato pollute cache while loading. Any way to achieve it ?
I find that store operations have non-polluting cache version in SSE2, but no counterpart about load operations, why ? Any implementation obstacle or other consideration?
Thanks for help.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are welcome to experiment with hardware prefetch settings to avoid speculative fetching of cache lines. Platforms where this is likely to make a difference often have such options in BIOS setup.
I'm sure you are aware that large stride arrays simply aren't efficient on cache architectures, and architectures which provide a cache bypass for them have not been successful on the market. It's probably less expensive to buy the largest available cache than to implement cache bypass measures.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You will be happy to hear that SSE4.1 available in H2 this year with Penryn CPU has instruction MOVNTDQA which is essentially a streaming load operation. It bypasses the cache and uses streaming buffers for loading data.
Unfortunately, initial implementation of MOVNTDQA will act upon USWC (uncacheable, write combining) memory because it is meant to improve MMIO operations. Future implementations may also work on WB (writeback) memory type enabling you to do exactly what you want here.
As of now it is simply impossible but with some patience...

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page