Can I make some load ops not polluting cache?

yxyymagic · ‎04-24-2007

I want to make a data copying operation such as B[1 : n : 1] = A[1 : stride*n : stride], because arrayA has neither temporary reuse norspatial reuse, I don't want array Ato pollute cache while loading. Any way to achieve it ?

I find that store operations have non-polluting cache version in SSE2, but no counterpart about load operations, why ? Any implementation obstacle or other consideration?

Thanks for help.

TimP · ‎04-24-2007

You are welcome to experiment with hardware prefetch settings to avoid speculative fetching of cache lines. Platforms where this is likely to make a difference often have such options in BIOS setup.

I'm sure you are aware that large stride arrays simply aren't efficient on cache architectures, and architectures which provide a cache bypass for them have not been successful on the market. It's probably less expensive to buy the largest available cache than to implement cache bypass measures.

levicki · ‎05-10-2007

You will be happy to hear that SSE4.1 available in H2 this year with Penryn CPU has instruction MOVNTDQA which is essentially a streaming load operation. It bypasses the cache and uses streaming buffers for loading data.

Unfortunately, initial implementation of MOVNTDQA will act upon USWC (uncacheable, write combining) memory because it is meant to improve MMIO operations. Future implementations may also work on WB (writeback) memory type enabling you to do exactly what you want here.

As of now it is simply impossible but with some patience...