<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Retired DTLB misses in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/Retired-DTLB-misses/m-p/1129006#M6370</link>
    <description>&lt;P&gt;Hi all,&lt;/P&gt;

&lt;P&gt;I am profiling the DTLB misses in Intel Skylake CPU.&lt;/P&gt;

&lt;P&gt;However, the performance counters of DTLB misses do not seem to be precise from my benchmarks. I doubt that the speculative prefetching are counted by DTLB miss performance counters, like the following one:&lt;/P&gt;

&lt;P&gt;DTLB_LOAD_MISSES.STLB_HIT:Loads that miss the DTLB and hit the STLB.&lt;BR /&gt;
	cpu/umask=0x20,event=0x08,name=DTLB_LOAD_MISSES.STLB_HIT/&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;They appear to be several magnitudes of the expected number. They are only precise when I run a pointer chasing benchmark (dependency). Can anyone explain to me the meaning of the performance counter above?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Also, is there anyway to count the number of retired DTLB misses, so that it excludes the misses from speculation?&lt;/P&gt;

&lt;P&gt;I do see retired STLB miss performance counter:&lt;/P&gt;

&lt;P&gt;cpu/umask=0x81,event=0xD0,name=MEM_INST_RETIRED.ALL_LOADS/&lt;/P&gt;

&lt;P&gt;Are there similar performance counters for DTLB?&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;

&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 23 May 2018 20:18:34 GMT</pubDate>
    <dc:creator>Zhu__Weixi</dc:creator>
    <dc:date>2018-05-23T20:18:34Z</dc:date>
    <item>
      <title>Retired DTLB misses</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Retired-DTLB-misses/m-p/1129006#M6370</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;

&lt;P&gt;I am profiling the DTLB misses in Intel Skylake CPU.&lt;/P&gt;

&lt;P&gt;However, the performance counters of DTLB misses do not seem to be precise from my benchmarks. I doubt that the speculative prefetching are counted by DTLB miss performance counters, like the following one:&lt;/P&gt;

&lt;P&gt;DTLB_LOAD_MISSES.STLB_HIT:Loads that miss the DTLB and hit the STLB.&lt;BR /&gt;
	cpu/umask=0x20,event=0x08,name=DTLB_LOAD_MISSES.STLB_HIT/&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;They appear to be several magnitudes of the expected number. They are only precise when I run a pointer chasing benchmark (dependency). Can anyone explain to me the meaning of the performance counter above?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Also, is there anyway to count the number of retired DTLB misses, so that it excludes the misses from speculation?&lt;/P&gt;

&lt;P&gt;I do see retired STLB miss performance counter:&lt;/P&gt;

&lt;P&gt;cpu/umask=0x81,event=0xD0,name=MEM_INST_RETIRED.ALL_LOADS/&lt;/P&gt;

&lt;P&gt;Are there similar performance counters for DTLB?&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;

&lt;P class="p1"&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 23 May 2018 20:18:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Retired-DTLB-misses/m-p/1129006#M6370</guid>
      <dc:creator>Zhu__Weixi</dc:creator>
      <dc:date>2018-05-23T20:18:34Z</dc:date>
    </item>
    <item>
      <title>I have not had a chance to</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Retired-DTLB-misses/m-p/1129007#M6371</link>
      <description>&lt;P&gt;I have not had a chance to look at the DTLB miss events on Skylake, but the discussion in the forum thread at &lt;A href="https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/593830" target="_blank"&gt;https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/593830&lt;/A&gt; for Haswell (Xeon E5 v3) systems may be relevant -- especially when you get down as far as comment # 9 (https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/593830#comment-1840629), where I finally realized that the "next page prefetcher" eliminates the overwhelming majority of TLB misses for contiguous access patterns.&lt;/P&gt;

&lt;P&gt;Understanding the Haswell results required comparison between the DTLB_LOAD_MISSES (Event 0x08) and PAGE_WALKER_LOADS (Event 0xBC) results.&amp;nbsp;&amp;nbsp; The PAGE_WALKER_LOADS event is not listed in the events for Skylake, but there is no new event listed for 0xBC, so it is possible that the event still exists.&lt;/P&gt;</description>
      <pubDate>Mon, 04 Jun 2018 16:11:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Retired-DTLB-misses/m-p/1129007#M6371</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2018-06-04T16:11:51Z</dc:date>
    </item>
  </channel>
</rss>

