Skylake cache latencies slower than Haswell?

T_C · ‎11-12-2015

Am I missing something here?

http://www.intel.co.uk/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf

Skylake (page 2-5):

L1: 4 cycles

L2: 12 cycles

L3: 44 cycles

Haswell (Page 2-10):

L1: 4 cycles

L2: 11 cycles

L3: 34 cycles

I thought Skylake is supposed to be better than Haswell??

McCalpinJohn · ‎11-14-2015

Skylake has a number of improvements that should provide increased cache throughput. It is very common for there to be a tradeoff between unloaded latency and throughput under load. Spreading cache lines across the four slices of the L3 cache will increase the latency, but will also increase the worst-case throughput considerably by decreasing conflicts for L3 access. Nothing new here....

Skylake also has significantly higher frequencies than Haswell in the lower power bins (e.g., 35W), and it is possible that latency tradeoffs were made to enable some of these power reductions. One common case where latency and power conflict is in speculative access to the next level of cache. If you don't care about power you can start L3 accesses in parallel with the L2 access & thereby reduce the average latency in the event of an L2 miss. The lowest latency comes with the highest power consumption, and vice-versa.