I measure the L1 (data), L2 (data), L3 (combined) and TLB misses with PAPIon an Intel Xeon X5650. As expected, the L1 misses are 1 per element with a stride of 8 (equals 64 bytes which is the cachline size). However, with further increasing stride sizes the misses go up to 2 per element at a stride of 32KB. The L2 and L3 misses reach 2 at 128KB.
I am not sure why the misses go up to 2. My assumption is that it has to do with the TLB misses and that the additional data cache misses are induced by accesses to the paging structures. Is there a possibility to confirm this assumption? And why do the L2/L3 misses reach 2 misses/element at 128KB and the L1 misses at 32KB already?
Hello schwald, Yes, the TLB misses show up in the L1 misses. The L1 misses show upas L2 and or L3 misses. I'm still working through the L2 & L3 side of the story... I think that my tester might be getting some hits in L2 or L3 so I have a little more work to do. Pat