Maybe you can explain to me why every instruction that references/uses a memory location, even in deep loops on an array of consecutive memory locations, causes a retired load miss in the L3 cache. Sorry about labelling the event wrong it should read MEM_LOAD_RETIRED.L3_MISS. To my knowledge when a miss occurs in the L3 cache it brings in a page from physical memory and loads the L3 cache (cache line fill), as well as the L2 or L1 cache. At which point the subsequent memory reads should hit in the L3 cache and not miss (locality). However, I have a for loop that reads 2MB of consecutive memory and each pass of the for loop causes 6 L3 cache misses, three of these misses are caused from loading the index value from memory, reading the end condition from memory, and storing the incremented index value in memory.One from reading the destination address. And two from reading a storage variable and writing an incremented storage value. So each memory read/write causes a L3 miss.
for loop described below:
pt = 0x2300000;
for (l = 0; l < read_2MB; l++) //this causes3 misses
temp += pt