Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Event differences

heinerj
Beginner
695 Views
What are the differences between the UNC_L3_MISS.ANY (09_03H), LLC_MISSES (2E_41H), and MEM_LOAD_RETIRED.L3_MISS (CB_10H) events in how they determine what a L3 cache miss is for the i7 quad core processor (Family_Model 06_1EH)?
0 Kudos
4 Replies
Patrick_F_Intel1
Employee
695 Views
Hello heinrej,
UNC_L3_MISS.ANY (09_03h) counts all L3 misses in the uncore.
MEM_LOAD_RETIRED.LLC_MISS (cb_10h) counts 'Retired loads that miss the LLC cache'.
If the prefetchers bring the data in from memory so that an LLC_MISS is avoided then the MEM_LOAD_RETIRED.LLC_MISS will not increment.

You can test this by disabling the prefetchers in your BIOS (if the BIOS supports disabling the prefetchers).

With the prefetchers disabled you will see that the count of MEM_LOAD_RETIRED.LLC_MISS is very close to UNC_L3_MISS.ANY.

I can't test LLC_MISSES (2e_41h) on my system but you should easily be able to see if the prefetchers impact event 2e_41h.
Hope this helps,
Pat
0 Kudos
heinerj
Beginner
695 Views
If my bios does not support disabling the prefetchers, is there a way that I can disable them?
0 Kudos
Patrick_F_Intel1
Employee
695 Views
Hello heinerj,
There is no publicly disclosed method of disabling the prefetchers on Nehalem, Sandy bridge and similar chips.
Pat
0 Kudos
heinerj
Beginner
695 Views

Maybe you can explain to me why every instruction that references/uses a memory location, even in deep loops on an array of consecutive memory locations, causes a retired load miss in the L3 cache. Sorry about labelling the event wrong it should read MEM_LOAD_RETIRED.L3_MISS. To my knowledge when a miss occurs in the L3 cache it brings in a page from physical memory and loads the L3 cache (cache line fill), as well as the L2 or L1 cache. At which point the subsequent memory reads should hit in the L3 cache and not miss (locality). However, I have a for loop that reads 2MB of consecutive memory and each pass of the for loop causes 6 L3 cache misses, three of these misses are caused from loading the index value from memory, reading the end condition from memory, and storing the incremented index value in memory.One from reading the destination address. And two from reading a storage variable and writing an incremented storage value. So each memory read/write causes a L3 miss.
for loop described below:
pt = 0x2300000;
for (l = 0; l < read_2MB; l++) //this causes3 misses
{
temp += pt;//this causes3 misses
}

0 Kudos
Reply