Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Skylake cache latencies slower than Haswell?

T_C
Beginner
417 Views

Am I missing something here?

http://www.intel.co.uk/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf

Skylake (page 2-5): 

L1: 4 cycles

L2: 12 cycles

L3: 44 cycles

Haswell (Page 2-10):

L1: 4 cycles

L2: 11 cycles

L3: 34 cycles

I thought Skylake is supposed to be better than Haswell??

0 Kudos
1 Reply
McCalpinJohn
Honored Contributor III
417 Views

Skylake has a number of improvements that should provide increased cache throughput.  It is very common for there to be a tradeoff between unloaded latency and throughput under load.   Spreading cache lines across the four slices of the L3 cache will increase the latency, but will also increase the worst-case throughput considerably by decreasing conflicts for L3 access.  Nothing new here....  

Skylake also has significantly higher frequencies than Haswell in the lower power bins (e.g., 35W), and it is possible that latency tradeoffs were made to enable some of these power reductions.  One common case where latency and power conflict is in speculative access to the next level of cache.  If you don't care about power you can start L3 accesses in parallel with the L2 access & thereby reduce the average latency in the event of an L2 miss.  The lowest latency comes with the highest power consumption, and vice-versa.

0 Kudos
Reply