Community
cancel
Showing results for 
Search instead for 
Did you mean: 
LY
Beginner
117 Views

TLB events in Intel(R) Xeon(R) CPU E5-2695 v2

Jump to solution

Hi,

I have a problem about how to check TLB events. I went through /sys/devices/system/cpu/*/ and found no information about TLB size. In my opinion, is TLB is a different cache from L1/2/3? And there are two methods to deal with TLB misses, hardware and kernel. Does Intel(R) Xeon(R) CPU E5-2695 v2 handle TLB misses? Or it is handled by os kernel? In addition, I didn't find related part in "Intel ® Xeon ® Processor E5 v2 and E7 v2 Product Families Uncore Performance Monitoring Reference Manual".

And there is another question, I read some materials in which TLB cache is integrated into CPU. And there is a Front-Side Bus (for transferring physical addresses) connecting it directly to Memory Controller without going through from L1, L2 down to L3. Is this correct?

Thanks.

0 Kudos
1 Solution
McCalpinJohn
Black Belt
117 Views

The interaction of the TLBs with the standard cache hierarchy is very complex, with some processors supporting different modes of operation and all processors supporting OS-controllable caching of the upper levels of the page translation hierarchy.   This is discussed in many sections of Volume 3 of the Intel Architectures Software Developer's Manual (Intel document 325384, revision 056, September 2015).   Section 4.5 is the most relevant to most current processors, but note that it describes two different modes of operation depending on the setting of the CR4.PCIDE configuration bit.

It is often the case that the most interesting information comes indirectly, and I found this to be the case on Intel's Xeon E5-26xx v3 processors.  This family of processors includes a performance monitoring event PAGE_WALKER_LOADS (event code 0xBC) that can be programmed to increment when the page walker finds the TLB entry in various levels of the cache hierarchy.   In the tests that I ran (discussed in forum threads at https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring... and https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring..., I found that the page table walker found entries in every level of the memory hierarchy (the L1, L2, L3, and memory).

View solution in original post

4 Replies
McCalpinJohn
Black Belt
117 Views

The TLB characteristics of recent Intel processors are discussed in Chapter 2 of the "Intel 64 and IA-32 Architectures Optimization Reference Manual" (Intel document 248966, revision 031, September 2015).  For the Xeon E5-2695 v2, the relevant information is in Section 2.3.5, and particularly in Table 2-20.

Most memory accesses (including most memory accesses related to TLB reloads) go through the cache hierarchy.

LY
Beginner
117 Views

Hi Dr. McCalpin,

I went through the chapter 2 in the book you pointed out especially Section 2.3.5, but I didn't find it clarifies the architecture of how iTLB, dTLB and sTLB put in cache hierarchy. I asked the same questions to my Comp. Arch. professor  and he told me these TLBs are generally put in L2 and L3. Could you tell me how Sandy Bridge Microarchitecture places iTLB, dTLB and sTLB into different cache hierarchy?

Thanks. : )

John D. McCalpin wrote:

The TLB characteristics of recent Intel processors are discussed in Chapter 2 of the "Intel 64 and IA-32 Architectures Optimization Reference Manual" (Intel document 248966, revision 031, September 2015).  For the Xeon E5-2695 v2, the relevant information is in Section 2.3.5, and particularly in Table 2-20.

Most memory accesses (including most memory accesses related to TLB reloads) go through the cache hierarchy.

McCalpinJohn
Black Belt
118 Views

The interaction of the TLBs with the standard cache hierarchy is very complex, with some processors supporting different modes of operation and all processors supporting OS-controllable caching of the upper levels of the page translation hierarchy.   This is discussed in many sections of Volume 3 of the Intel Architectures Software Developer's Manual (Intel document 325384, revision 056, September 2015).   Section 4.5 is the most relevant to most current processors, but note that it describes two different modes of operation depending on the setting of the CR4.PCIDE configuration bit.

It is often the case that the most interesting information comes indirectly, and I found this to be the case on Intel's Xeon E5-26xx v3 processors.  This family of processors includes a performance monitoring event PAGE_WALKER_LOADS (event code 0xBC) that can be programmed to increment when the page walker finds the TLB entry in various levels of the cache hierarchy.   In the tests that I ran (discussed in forum threads at https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring... and https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring..., I found that the page table walker found entries in every level of the memory hierarchy (the L1, L2, L3, and memory).

View solution in original post

LY
Beginner
117 Views

Thanks! Your answer really helps me!

John D. McCalpin wrote:

The interaction of the TLBs with the standard cache hierarchy is very complex, with some processors supporting different modes of operation and all processors supporting OS-controllable caching of the upper levels of the page translation hierarchy.   This is discussed in many sections of Volume 3 of the Intel Architectures Software Developer's Manual (Intel document 325384, revision 056, September 2015).   Section 4.5 is the most relevant to most current processors, but note that it describes two different modes of operation depending on the setting of the CR4.PCIDE configuration bit.

It is often the case that the most interesting information comes indirectly, and I found this to be the case on Intel's Xeon E5-26xx v3 processors.  This family of processors includes a performance monitoring event PAGE_WALKER_LOADS (event code 0xBC) that can be programmed to increment when the page walker finds the TLB entry in various levels of the cache hierarchy.   In the tests that I ran (discussed in forum threads at https://software.intel.com/en-us/forums/software-tuning-performance-opti... and https://software.intel.com/en-us/forums/software-tuning-performance-opti...), I found that the page table walker found entries in every level of the memory hierarchy (the L1, L2, L3, and memory).

Reply