Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

TLB events in Intel(R) Xeon(R) CPU E5-2695 v2

LY
Beginner
804 Views

Hi,

I have a problem about how to check TLB events. I went through /sys/devices/system/cpu/*/ and found no information about TLB size. In my opinion, is TLB is a different cache from L1/2/3? And there are two methods to deal with TLB misses, hardware and kernel. Does Intel(R) Xeon(R) CPU E5-2695 v2 handle TLB misses? Or it is handled by os kernel? In addition, I didn't find related part in "Intel ® Xeon ® Processor E5 v2 and E7 v2 Product Families Uncore Performance Monitoring Reference Manual".

And there is another question, I read some materials in which TLB cache is integrated into CPU. And there is a Front-Side Bus (for transferring physical addresses) connecting it directly to Memory Controller without going through from L1, L2 down to L3. Is this correct?

Thanks.

0 Kudos
1 Solution
McCalpinJohn
Honored Contributor III
804 Views

The interaction of the TLBs with the standard cache hierarchy is very complex, with some processors supporting different modes of operation and all processors supporting OS-controllable caching of the upper levels of the page translation hierarchy.   This is discussed in many sections of Volume 3 of the Intel Architectures Software Developer's Manual (Intel document 325384, revision 056, September 2015).   Section 4.5 is the most relevant to most current processors, but note that it describes two different modes of operation depending on the setting of the CR4.PCIDE configuration bit.

It is often the case that the most interesting information comes indirectly, and I found this to be the case on Intel's Xeon E5-26xx v3 processors.  This family of processors includes a performance monitoring event PAGE_WALKER_LOADS (event code 0xBC) that can be programmed to increment when the page walker finds the TLB entry in various levels of the cache hierarchy.   In the tests that I ran (discussed in forum threads at https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/593830 and https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/332852), I found that the page table walker found entries in every level of the memory hierarchy (the L1, L2, L3, and memory).

View solution in original post

0 Kudos
4 Replies
McCalpinJohn
Honored Contributor III
804 Views

The TLB characteristics of recent Intel processors are discussed in Chapter 2 of the "Intel 64 and IA-32 Architectures Optimization Reference Manual" (Intel document 248966, revision 031, September 2015).  For the Xeon E5-2695 v2, the relevant information is in Section 2.3.5, and particularly in Table 2-20.

Most memory accesses (including most memory accesses related to TLB reloads) go through the cache hierarchy.

0 Kudos
LY
Beginner
804 Views

Hi Dr. McCalpin,

I went through the chapter 2 in the book you pointed out especially Section 2.3.5, but I didn't find it clarifies the architecture of how iTLB, dTLB and sTLB put in cache hierarchy. I asked the same questions to my Comp. Arch. professor  and he told me these TLBs are generally put in L2 and L3. Could you tell me how Sandy Bridge Microarchitecture places iTLB, dTLB and sTLB into different cache hierarchy?

Thanks. : )

John D. McCalpin wrote:

The TLB characteristics of recent Intel processors are discussed in Chapter 2 of the "Intel 64 and IA-32 Architectures Optimization Reference Manual" (Intel document 248966, revision 031, September 2015).  For the Xeon E5-2695 v2, the relevant information is in Section 2.3.5, and particularly in Table 2-20.

Most memory accesses (including most memory accesses related to TLB reloads) go through the cache hierarchy.

0 Kudos
McCalpinJohn
Honored Contributor III
805 Views

The interaction of the TLBs with the standard cache hierarchy is very complex, with some processors supporting different modes of operation and all processors supporting OS-controllable caching of the upper levels of the page translation hierarchy.   This is discussed in many sections of Volume 3 of the Intel Architectures Software Developer's Manual (Intel document 325384, revision 056, September 2015).   Section 4.5 is the most relevant to most current processors, but note that it describes two different modes of operation depending on the setting of the CR4.PCIDE configuration bit.

It is often the case that the most interesting information comes indirectly, and I found this to be the case on Intel's Xeon E5-26xx v3 processors.  This family of processors includes a performance monitoring event PAGE_WALKER_LOADS (event code 0xBC) that can be programmed to increment when the page walker finds the TLB entry in various levels of the cache hierarchy.   In the tests that I ran (discussed in forum threads at https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/593830 and https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/332852), I found that the page table walker found entries in every level of the memory hierarchy (the L1, L2, L3, and memory).

0 Kudos
LY
Beginner
804 Views

Thanks! Your answer really helps me!

John D. McCalpin wrote:

The interaction of the TLBs with the standard cache hierarchy is very complex, with some processors supporting different modes of operation and all processors supporting OS-controllable caching of the upper levels of the page translation hierarchy.   This is discussed in many sections of Volume 3 of the Intel Architectures Software Developer's Manual (Intel document 325384, revision 056, September 2015).   Section 4.5 is the most relevant to most current processors, but note that it describes two different modes of operation depending on the setting of the CR4.PCIDE configuration bit.

It is often the case that the most interesting information comes indirectly, and I found this to be the case on Intel's Xeon E5-26xx v3 processors.  This family of processors includes a performance monitoring event PAGE_WALKER_LOADS (event code 0xBC) that can be programmed to increment when the page walker finds the TLB entry in various levels of the cache hierarchy.   In the tests that I ran (discussed in forum threads at https://software.intel.com/en-us/forums/software-tuning-performance-opti... and https://software.intel.com/en-us/forums/software-tuning-performance-opti...), I found that the page table walker found entries in every level of the memory hierarchy (the L1, L2, L3, and memory).

0 Kudos
Reply