How much detail can I drill into with the TLB performance counters? In particular, I'd like to see how effective the 2mb/1gb TLB entries are. Are there any counters that let me see hits/misses for the TLB based on the page table entry size?
It's more a general question than a chipset specific question, as I'm interested on a variety of platforms - anything Nehalem and later.
But my immediate need is for Sandy Bridge Xeon (E5-2650). I'm also interested for Ivy Bridge and the server version of Ivy Bridge that I hear is coming out soon.
I did read the SDM chapter on performance counting. DTLB_LOAD_MISSES.* is kind of there. It does have 4k and 2M/4M page walks but not 1G page walks. It also only has the page size specifics in table 19-2 (Non-arch perf events, 4th generation intel core processors) and nothing in the SB or SB Xeon tables about this. It also has DTLB_LOAD_MISSES.WALK_DURATION which is the cycles busy doing a walk; I want to be able to filter _that_ by the page size if possible.
In any case, being able to count the 1G page walks would be really, really helpful.