Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

cache eviction policy of Intel newer CPUs

le_g_1
New Contributor I
2,455 Views

Hi everybody,

      From intel processor's optimiation manual, I know that in Sandy Bridge, L1 and L2 cache is shared within each core but L3 cache is shared by all the cores. But what's the evition policy: eg, can data remain in a L2 cache if it's evicted for L3 by another core? What about this policy in other architectures, eg, in core 2 like Q8200?

0 Kudos
8 Replies
SergeyKostrov
Valued Contributor II
2,455 Views
Let's say sizes for L3 and L2 cache lines are as follows: ... Size of L3 Cache = 8MB ( shared between all cores for data & instructions ) Size of L2 Cache = 1MB ( 256KB per core / shared for data & instructions ) ... My understanding is that if an application loads a new 8M data set into memory then all old data in the L2 cache line will be evicted.
0 Kudos
Bernard
Valued Contributor I
2,455 Views

 IIRC for higher caches in cache hierarchy eviction policy is based on least frequently or least recently  used algorithm.It seems logical that L3 cache eviction policy will dictate full cache reload when new set of data is load into memory.

0 Kudos
SergeyKostrov
Valued Contributor II
2,455 Views
>>...can data remain in a L2 cache if it's evicted for L3 by another core?.. If a software prefetch is used with _mm_prefetch intrinsic function than different hints need to be taken into account: [ xmmintrin.h ] ... /* constants for use with _mm_prefetch */ #define _MM_HINT_T0 1 #define _MM_HINT_T1 2 #define _MM_HINT_T2 3 #define _MM_HINT_NTA 0 ... Take a look at Intel Software Developer Manual for a complete description. Thanks.
0 Kudos
McCalpinJohn
Honored Contributor III
2,455 Views

The general approach used by the Intel processors is for the L3 cache to be inclusive of all of the L1 and L2 caches on the chip. If a line is selected to be evicted from the L3, then any copies of that line in the L1 or L2 caches of on that chip must also be evicted.

Notes:

  1. The inclusive L3 approach makes cache coherence much easier to implement -- you only need to check the L3 cache tags to know if data is on a chip.  If it is on a chip, then the L3 tags will tell precisely which L1 and/or L2 caches hold the data.  
  2. The inclusive L3 approach also makes data sharing between cores on the same chip much more efficient, since the L3 keeps track of which core (on the same chip) might have a modified copy of the cache line.  This allows it to be retrieved more quickly, since you do not need to query the other chip(s) and wait for their responses.
  3. The downside of an inclusive L3 is that lines that are in active use in the L1 caches can be evicted because from the L3's perspective they have not recently been loaded.  Intel has some "magic" to reduce this occurrence of this undesirable cache line eviction, but as far as I can tell it is not well documented.  For the Westmere-based processors, this feature was controlled by the BIOS "data reuse optimization" option.
0 Kudos
le_g_1
New Contributor I
2,455 Views

Thank you, Mrs. McCalpin.

I have a presumptuous request that I wonder if your can tell me where can I find more documents about Intel CPU's cache behaviour. I have in hand only Software Developer’s Manual and optimiation guidelines. Is there some documents that is  more specific for cache?

0 Kudos
McCalpinJohn
Honored Contributor III
2,454 Views

The Intel Architecture Software Optimization Guide probably has the most information, but it is often necessary cross-reference between the SW Optimization guide and the descriptions of the performance monitoring events in Chapter 19 of Volume 3 of the Intel Architecture Software Developer's Manual.  I don't know how much of this I would have been able to understand if I had not worked in processor design at SGI, IBM, and AMD, along with getting good technical support from Intel (both at SGI while designing a system for the Itanium2 processor, and now as a customer).

0 Kudos
Bernard
Valued Contributor I
2,454 Views

 You can check "The Intel Architecture Software Optimization Guide" chapter 7 for cache related information.I would also try to search web for various cache aware programming techniques.

CHeck this link:http://stackoverflow.com/questions/1922249/c-cache-aware-programming

0 Kudos
Ahmad_S_Intel
Employee
2,455 Views

John McCalpin wrote:

The downside of an inclusive L3 is that lines that are in active use in the L1 caches can be evicted because from the L3's perspective they have not recently been loaded.  Intel has some "magic" to reduce this occurrence of this undesirable cache line eviction, but as far as I can tell it is not well documented.  For the Westmere-based processors, this feature was controlled by the BIOS "data reuse optimization" option.

This paper talks about options of how to reduce the undesirable cache back invalidate of hot blocks in upper level caches by conveying the "hottness" information to lower levels.

http://dl.acm.org/citation.cfm?id=1935019

0 Kudos
Reply