A processor may or may not implement any of the paging-structure caches. Software should rely on neither their presence nor their absence. The processor may invalidate entries in these caches at any time. Because the processor may create the cache entries at the time of translation and not update them following subsequent modifications to the paging structures in memory, software should take care to invalidate the cache entries appropriately when causing such modifications. The invalidation of
TLBs and the paging-structure caches is described in Section 4.10.4.
It looks like the only way to figure this out for Intel processors is very careful designed microbenchmark testing. The discussion in section 4.10.3 of Volume 3 of the SWDM provides a reasonably clear explanation of the way that these caches (if they exist) are used in the page translation process.
It will not be easy to build a microbenchmark for testing these structures, but it should be possible. Hardware performance counters might also be useful for determining whether there is a jump in additional memory references as one increases the number of distinct entries accessed at each level of the hierarchical translation (indicating overflow of the cache at that level).
Following all of the cases through the documentation is difficult, but my current interpretation is that for my Xeon E5-2580 processors (Sandy Bridge EP) running in 64-bit mode ("IA-32e paging"), PCID's are enabled, which means that the higher-level entries in the hierarchical page table structure are read with the PAT type from index 0 of the PAT MSR (0x277), which is "UC-" on this system. This is probably necessary because the paging-structure caches described in section 4.10.3 are augmented with 12-bit PCID values, and there is no place in the data caches to hold this extra information. If this interpretation is correct, then overflowing any level of the paging-structure cache will generate an uncached load from memory. This should be a lot easier to find via either timing or performance counters than a TLB walk that finds the paging-structure entries in the regular cache hierarchy.
Hi John, thanks for the reply.
A next question is if upon repeated L2 TLB miss the paging table data structures also "polute" (enter) the cores' cache hierarchies ... I suspect that Intel may have some mechanism to throttle this from happening or the h/w page walks may just leave the cache memories unaffected.