<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic The interaction of the TLBs in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/TLB-events-in-Intel-R-Xeon-R-CPU-E5-2695-v2/m-p/1062318#M5178</link>
    <description>&lt;P&gt;The interaction of the TLBs with the standard cache hierarchy is very complex, with some processors supporting different modes of operation and all processors supporting OS-controllable caching of the upper levels of the page translation hierarchy.&amp;nbsp;&amp;nbsp; This is discussed in many sections of Volume 3 of the Intel Architectures Software Developer's Manual (Intel document 325384, revision 056, September 2015).&amp;nbsp;&amp;nbsp; Section 4.5 is the most relevant to most current processors, but note that it describes two different modes of operation depending on the setting of the CR4.PCIDE configuration bit.&lt;/P&gt;

&lt;P&gt;It is often the case that the most interesting information comes indirectly, and I found this to be the case on Intel's Xeon E5-26xx v3 processors.&amp;nbsp; This family of processors includes a performance monitoring event PAGE_WALKER_LOADS (event code 0xBC) that can be programmed to increment when the page walker finds the TLB entry in various levels of the cache hierarchy.&amp;nbsp;&amp;nbsp; In the tests that I ran (discussed in forum threads at &lt;A href="https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/593830" target="_blank"&gt;https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/593830&lt;/A&gt; and &lt;A href="https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/332852)" target="_blank"&gt;https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/332852)&lt;/A&gt;, I found that the page table walker found entries in every level of the memory hierarchy (the L1, L2, L3, and memory).&lt;/P&gt;</description>
    <pubDate>Thu, 12 Nov 2015 17:01:06 GMT</pubDate>
    <dc:creator>McCalpinJohn</dc:creator>
    <dc:date>2015-11-12T17:01:06Z</dc:date>
    <item>
      <title>TLB events in Intel(R) Xeon(R) CPU E5-2695 v2</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/TLB-events-in-Intel-R-Xeon-R-CPU-E5-2695-v2/m-p/1062315#M5175</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Hi,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I have a problem about how to check TLB events. I went through&amp;nbsp;/sys/devices/system/cpu/*/ and found no information about TLB size. In my opinion, is TLB is a different cache from L1/2/3? And there are two methods to deal with TLB misses, hardware and kernel. Does&amp;nbsp;Intel(R) Xeon(R) CPU E5-2695 v2 handle TLB misses? Or it is handled by os kernel? In addition, I didn't find related part in "&lt;!--StartFragment--&gt;Intel ® Xeon ® Processor E5 v2 and E7 v2 Product Families Uncore Performance Monitoring Reference Manual".&lt;!--EndFragment--&gt;&lt;/P&gt;

&lt;P&gt;And there is another question, I read some materials in which TLB cache is integrated into CPU. And there is a Front-Side Bus (for transferring physical addresses) connecting it directly to Memory Controller without going through from L1, L2 down to L3. Is this correct?&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Nov 2015 16:09:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/TLB-events-in-Intel-R-Xeon-R-CPU-E5-2695-v2/m-p/1062315#M5175</guid>
      <dc:creator>LY</dc:creator>
      <dc:date>2015-11-11T16:09:21Z</dc:date>
    </item>
    <item>
      <title>The TLB characteristics of</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/TLB-events-in-Intel-R-Xeon-R-CPU-E5-2695-v2/m-p/1062316#M5176</link>
      <description>&lt;P&gt;The TLB characteristics of recent Intel processors are discussed in Chapter 2 of the "Intel 64 and IA-32 Architectures Optimization Reference Manual" (Intel document 248966, revision 031, September 2015).&amp;nbsp; For the Xeon E5-2695 v2, the relevant information is in Section 2.3.5, and particularly in Table 2-20.&lt;/P&gt;

&lt;P&gt;Most memory accesses (including most memory accesses related to TLB reloads) go through the cache hierarchy.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Nov 2015 22:27:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/TLB-events-in-Intel-R-Xeon-R-CPU-E5-2695-v2/m-p/1062316#M5176</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2015-11-11T22:27:22Z</dc:date>
    </item>
    <item>
      <title>Hi Dr. McCalpin,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/TLB-events-in-Intel-R-Xeon-R-CPU-E5-2695-v2/m-p/1062317#M5177</link>
      <description>&lt;P&gt;Hi Dr. McCalpin,&lt;/P&gt;

&lt;P&gt;I went through the chapter 2 in the book you pointed out especially Section 2.3.5, but I didn't find it clarifies the architecture of how iTLB, dTLB and sTLB put in cache hierarchy. I asked the same questions to my Comp. Arch. professor &amp;nbsp;and he told me these TLBs are generally put in L2 and L3. Could you tell me how Sandy Bridge Microarchitecture places iTLB, dTLB and sTLB into different cache hierarchy?&lt;/P&gt;

&lt;P&gt;Thanks. : )&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;John D. McCalpin wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;The TLB characteristics of recent Intel processors are discussed in Chapter 2 of the "Intel 64 and IA-32 Architectures Optimization Reference Manual" (Intel document 248966, revision 031, September 2015).&amp;nbsp; For the Xeon E5-2695 v2, the relevant information is in Section 2.3.5, and particularly in Table 2-20.&lt;/P&gt;

&lt;P&gt;Most memory accesses (including most memory accesses related to TLB reloads) go through the cache hierarchy.&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Nov 2015 03:22:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/TLB-events-in-Intel-R-Xeon-R-CPU-E5-2695-v2/m-p/1062317#M5177</guid>
      <dc:creator>LY</dc:creator>
      <dc:date>2015-11-12T03:22:35Z</dc:date>
    </item>
    <item>
      <title>The interaction of the TLBs</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/TLB-events-in-Intel-R-Xeon-R-CPU-E5-2695-v2/m-p/1062318#M5178</link>
      <description>&lt;P&gt;The interaction of the TLBs with the standard cache hierarchy is very complex, with some processors supporting different modes of operation and all processors supporting OS-controllable caching of the upper levels of the page translation hierarchy.&amp;nbsp;&amp;nbsp; This is discussed in many sections of Volume 3 of the Intel Architectures Software Developer's Manual (Intel document 325384, revision 056, September 2015).&amp;nbsp;&amp;nbsp; Section 4.5 is the most relevant to most current processors, but note that it describes two different modes of operation depending on the setting of the CR4.PCIDE configuration bit.&lt;/P&gt;

&lt;P&gt;It is often the case that the most interesting information comes indirectly, and I found this to be the case on Intel's Xeon E5-26xx v3 processors.&amp;nbsp; This family of processors includes a performance monitoring event PAGE_WALKER_LOADS (event code 0xBC) that can be programmed to increment when the page walker finds the TLB entry in various levels of the cache hierarchy.&amp;nbsp;&amp;nbsp; In the tests that I ran (discussed in forum threads at &lt;A href="https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/593830" target="_blank"&gt;https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/593830&lt;/A&gt; and &lt;A href="https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/332852)" target="_blank"&gt;https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/332852)&lt;/A&gt;, I found that the page table walker found entries in every level of the memory hierarchy (the L1, L2, L3, and memory).&lt;/P&gt;</description>
      <pubDate>Thu, 12 Nov 2015 17:01:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/TLB-events-in-Intel-R-Xeon-R-CPU-E5-2695-v2/m-p/1062318#M5178</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2015-11-12T17:01:06Z</dc:date>
    </item>
    <item>
      <title>Thanks! Your answer really</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/TLB-events-in-Intel-R-Xeon-R-CPU-E5-2695-v2/m-p/1062319#M5179</link>
      <description>&lt;P&gt;Thanks! Your answer really helps me!&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;John D. McCalpin wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;The interaction of the TLBs with the standard cache hierarchy is very complex, with some processors supporting different modes of operation and all processors supporting OS-controllable caching of the upper levels of the page translation hierarchy.&amp;nbsp;&amp;nbsp; This is discussed in many sections of Volume 3 of the Intel Architectures Software Developer's Manual (Intel document 325384, revision 056, September 2015).&amp;nbsp;&amp;nbsp; Section 4.5 is the most relevant to most current processors, but note that it describes two different modes of operation depending on the setting of the CR4.PCIDE configuration bit.&lt;/P&gt;

&lt;P&gt;It is often the case that the most interesting information comes indirectly, and I found this to be the case on Intel's Xeon E5-26xx v3 processors.&amp;nbsp; This family of processors includes a performance monitoring event PAGE_WALKER_LOADS (event code 0xBC) that can be programmed to increment when the page walker finds the TLB entry in various levels of the cache hierarchy.&amp;nbsp;&amp;nbsp; In the tests that I ran (discussed in forum threads at &lt;A href="https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/593830"&gt;https://software.intel.com/en-us/forums/software-tuning-performance-opti...&lt;/A&gt; and &lt;A href="https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/332852"&gt;https://software.intel.com/en-us/forums/software-tuning-performance-opti...&lt;/A&gt;), I found that the page table walker found entries in every level of the memory hierarchy (the L1, L2, L3, and memory).&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Nov 2015 04:38:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/TLB-events-in-Intel-R-Xeon-R-CPU-E5-2695-v2/m-p/1062319#M5179</guid>
      <dc:creator>LY</dc:creator>
      <dc:date>2015-11-13T04:38:53Z</dc:date>
    </item>
  </channel>
</rss>

