cache eviction policy of Intel newer CPUs

le_g_1 · ‎06-16-2013

Hi everybody,

From intel processor's optimiation manual, I know that in Sandy Bridge, L1 and L2 cache is shared within each core but L3 cache is shared by all the cores. But what's the evition policy: eg, can data remain in a L2 cache if it's evicted for L3 by another core? What about this policy in other architectures, eg, in core 2 like Q8200?

SergeyKostrov · ‎06-17-2013

Let's say sizes for L3 and L2 cache lines are as follows: ... Size of L3 Cache = 8MB ( shared between all cores for data & instructions ) Size of L2 Cache = 1MB ( 256KB per core / shared for data & instructions ) ... My understanding is that if an application loads a new 8M data set into memory then all old data in the L2 cache line will be evicted.

Bernard · ‎06-17-2013

IIRC for higher caches in cache hierarchy eviction policy is based on least frequently or least recently used algorithm.It seems logical that L3 cache eviction policy will dictate full cache reload when new set of data is load into memory.

SergeyKostrov · ‎06-17-2013

>>...can data remain in a L2 cache if it's evicted for L3 by another core?.. If a software prefetch is used with _mm_prefetch intrinsic function than different hints need to be taken into account: [ xmmintrin.h ] ... /* constants for use with _mm_prefetch */ #define _MM_HINT_T0 1 #define _MM_HINT_T1 2 #define _MM_HINT_T2 3 #define _MM_HINT_NTA 0 ... Take a look at Intel Software Developer Manual for a complete description. Thanks.

McCalpinJohn · ‎06-24-2013

The general approach used by the Intel processors is for the L3 cache to be inclusive of all of the L1 and L2 caches on the chip. If a line is selected to be evicted from the L3, then any copies of that line in the L1 or L2 caches of on that chip must also be evicted.

Notes:

The inclusive L3 approach makes cache coherence much easier to implement -- you only need to check the L3 cache tags to know if data is on a chip. If it is on a chip, then the L3 tags will tell precisely which L1 and/or L2 caches hold the data.
The inclusive L3 approach also makes data sharing between cores on the same chip much more efficient, since the L3 keeps track of which core (on the same chip) might have a modified copy of the cache line. This allows it to be retrieved more quickly, since you do not need to query the other chip(s) and wait for their responses.
The downside of an inclusive L3 is that lines that are in active use in the L1 caches can be evicted because from the L3's perspective they have not recently been loaded. Intel has some "magic" to reduce this occurrence of this undesirable cache line eviction, but as far as I can tell it is not well documented. For the Westmere-based processors, this feature was controlled by the BIOS "data reuse optimization" option.

le_g_1 · ‎06-26-2013

Thank you, Mrs. McCalpin.

I have a presumptuous request that I wonder if your can tell me where can I find more documents about Intel CPU's cache behaviour. I have in hand only Software Developer’s Manual and optimiation guidelines. Is there some documents that is more specific for cache?

McCalpinJohn · ‎06-27-2013

The Intel Architecture Software Optimization Guide probably has the most information, but it is often necessary cross-reference between the SW Optimization guide and the descriptions of the performance monitoring events in Chapter 19 of Volume 3 of the Intel Architecture Software Developer's Manual. I don't know how much of this I would have been able to understand if I had not worked in processor design at SGI, IBM, and AMD, along with getting good technical support from Intel (both at SGI while designing a system for the Itanium2 processor, and now as a customer).

Bernard · ‎06-28-2013

You can check "The Intel Architecture Software Optimization Guide" chapter 7 for cache related information.I would also try to search web for various cache aware programming techniques.

CHeck this link:http://stackoverflow.com/questions/1922249/c-cache-aware-programming

Ahmad_S_Intel · ‎07-10-2016

John McCalpin wrote:

The downside of an inclusive L3 is that lines that are in active use in the L1 caches can be evicted because from the L3's perspective they have not recently been loaded. Intel has some "magic" to reduce this occurrence of this undesirable cache line eviction, but as far as I can tell it is not well documented. For the Westmere-based processors, this feature was controlled by the BIOS "data reuse optimization" option.

This paper talks about options of how to reduce the undesirable cache back invalidate of hot blocks in upper level caches by conveying the "hottness" information to lower levels.

http://dl.acm.org/citation.cfm?id=1935019