- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
referring to "xeons" (nehalem, Westmere, SB) operating in the Intel64 "IA-32e Protected" and "Paging" mode (full 64-bit support, see http://download.intel.com/products/processor/manual/325384.pdf Vol3A) data in the "Memory Management" data structures (p2-8 Vol 3A) used in the effective to physical address translation mechanisms (p4-28 Vol 3A) can be cached by actual H/W: Section 4.10 "CACHING TRANSLATION INFORMATION" : "A processor may cache information from the paging-structure entries in TLBs and paging-structure caches".
The concept of TLB h/w caching page table entries discussed in subsection 4.10.2.2 is well known and the documentation elsewhere clearly highlights the TLB structures for each different micro-architecture.
For "Paging Structure Caches" of Section 4.10.3, it is mentioned that "A processor may support any or all the following paging-structure caches: PML4, PDPTE and PDE.... " data structures.
Does any of Xeons (Nehalem, Westmere, Sandy-Bridge) support any "Page Structure Cache" H/W ? I have NOT been able to find any reference for such H/W existing on any of these processors.
Should I assume that this is feature that is permitted by the ISA spec but has NOT been implemented by any if these processors?
Otherwise, can I find any more specific information about this H/W per processor?
I would appreciate any information or pointer to it ....
thanks
Michael
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Section 4.10.3 which you referenced above later includes the following paragraph:
A processor may or may not implement any of the paging-structure caches. Software should rely on neither their presence nor their absence. The processor may invalidate entries in these caches at any time. Because the processor may create the cache entries at the time of translation and not update them following subsequent modifications to the paging structures in memory, software should take care to invalidate the cache entries appropriately when causing such modifications. The invalidation of
TLBs and the paging-structure caches is described in Section 4.10.4.
The SDM is intended for software developers, and so it is phrased to aid in writing of safe and portable code, and to avoid making excessive assumptions about HW that may not hold for other processor designs of the same family or different family.
What is the problem you are trying to resolve with this information? Perhaps I can find out or refer you to more useful information or sources.
-Hussam
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, I am also curious about this. Can someone please confirm?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It looks like the only way to figure this out for Intel processors is very careful designed microbenchmark testing. The discussion in section 4.10.3 of Volume 3 of the SWDM provides a reasonably clear explanation of the way that these caches (if they exist) are used in the page translation process.
It will not be easy to build a microbenchmark for testing these structures, but it should be possible. Hardware performance counters might also be useful for determining whether there is a jump in additional memory references as one increases the number of distinct entries accessed at each level of the hierarchical translation (indicating overflow of the cache at that level).
Following all of the cases through the documentation is difficult, but my current interpretation is that for my Xeon E5-2580 processors (Sandy Bridge EP) running in 64-bit mode ("IA-32e paging"), PCID's are enabled, which means that the higher-level entries in the hierarchical page table structure are read with the PAT type from index 0 of the PAT MSR (0x277), which is "UC-" on this system. This is probably necessary because the paging-structure caches described in section 4.10.3 are augmented with 12-bit PCID values, and there is no place in the data caches to hold this extra information. If this interpretation is correct, then overflowing any level of the paging-structure cache will generate an uncached load from memory. This should be a lot easier to find via either timing or performance counters than a TLB walk that finds the paging-structure entries in the regular cache hierarchy.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi John, thanks for the reply.
A next question is if upon repeated L2 TLB miss the paging table data structures also "polute" (enter) the cores' cache hierarchies ... I suspect that Intel may have some mechanism to throttle this from happening or the h/w page walks may just leave the cache memories unaffected.
Mike
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page