UNC_L3_MISS.ANY (09_03h) counts all L3 misses in the uncore.
MEM_LOAD_RETIRED.LLC_MISS (cb_10h) counts 'Retired loads that miss the LLC cache'.
If the prefetchers bring the data in from memory so that an LLC_MISS is avoided then the MEM_LOAD_RETIRED.LLC_MISS will not increment.
You can test this by disabling the prefetchers in your BIOS (if the BIOS supports disabling the prefetchers).
With the prefetchers disabled you will see that the count of MEM_LOAD_RETIRED.LLC_MISS is very close to UNC_L3_MISS.ANY.
I can't test LLC_MISSES (2e_41h) on my system but you should easily be able to see if the prefetchers impact event 2e_41h.
Hope this helps,
Maybe you can explain to me why every instruction that references/uses a memory location, even in deep loops on an array of consecutive memory locations, causes a retired load miss in the L3 cache. Sorry about labelling the event wrong it should read MEM_LOAD_RETIRED.L3_MISS. To my knowledge when a miss occurs in the L3 cache it brings in a page from physical memory and loads the L3 cache (cache line fill), as well as the L2 or L1 cache. At which point the subsequent memory reads should hit in the L3 cache and not miss (locality). However, I have a for loop that reads 2MB of consecutive memory and each pass of the for loop causes 6 L3 cache misses, three of these misses are caused from loading the index value from memory, reading the end condition from memory, and storing the incremented index value in memory.One from reading the destination address. And two from reading a storage variable and writing an incremented storage value. So each memory read/write causes a L3 miss.
for loop described below:
pt = 0x2300000;
for (l = 0; l < read_2MB; l++) //this causes3 misses
temp += pt