<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Perf counters for measuring TLB miss rate in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/Perf-counters-for-measuring-TLB-miss-rate/m-p/1256403#M7803</link>
    <description>&lt;P&gt;&lt;SPAN&gt;I want to measure following things for an application:&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI aria-level="1"&gt;&lt;SPAN&gt;TLB miss rate&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-level="1"&gt;&lt;SPAN&gt;Number of cycles spent in Page Walks&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-level="1"&gt;&lt;SPAN&gt;Runtime in number of cycles&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;SPAN&gt;I have an Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz system.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;To calculate these I am using following perf counters:&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Total number of memory references&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;( X ) = mem_inst_retired.all_loads:u + mem_inst_retired.all_stores:u&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Total number of memory references that missed in TLB&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;( Y ) =&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;SPAN&gt;mem_inst_retired.stlb_miss_loads:u + mem_inst_retired.stlb_miss_stores:u&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL start="3"&gt;
&lt;LI&gt;&lt;STRONG&gt;TLB miss rate&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;= Y/X&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Number of cycles spent in Page Walks =&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;dtlb_store_misses.walk_pending:u + dtlb_load_misses.walk_pending:u&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Runtime in number of cycles =&amp;nbsp;&lt;/STRONG&gt;cycles&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;SPAN&gt;I am confused between three parameters to count the total number of references that missed the TLB:&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI aria-level="1"&gt;&lt;SPAN&gt;dtlb_load_misses.miss_causes_a_walk + dtlb_store_misses.miss_causes_a_walk&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-level="1"&gt;&lt;SPAN&gt;dtlb_load_misses.walk_completed + dtlb_store_misses.walk_completed&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-level="1"&gt;&lt;SPAN&gt;mem_inst_retired.stlb_miss_loads + mem_inst_retired.stlb_miss_stores&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;SPAN&gt;However, when I ran the sequential array access of size 64MB. { arr[i] = i;} I am getting following values for above counters:&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;(with THP disabled)&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;dtlb_store_misses.miss_causes_a_walk = 154771&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;dtlb_store_misses.walk_completed = 116499&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;mem_inst_retired.stlb_miss_stores = 15566&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;When I double the array size to 128 MB and then to 256 MB. These counters are also getting doubled approximately. Since, 64 MB array has 16K pages, I see that&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;mem_inst_retired.stlb_miss_stores&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;is giving the closest value.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Also, I didn’t see any effect of Next-page prefetcher in this as mentioned in this post (&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://community.intel.com/t5/Software-Tuning-Performance/Inconsistency-in-TLB-miss-counters/td-p/1032560H" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;https://community.intel.com/t5/Software-Tuning-Performance/Inconsistency-in-TLB-miss-counters/td-p/1...&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;). So, I suppose that my machine which has a SkyLake architecture, doesn’t have NPP.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Could you please let me know if I have chosen the right counters for my measurements?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Thanks in advance!&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;Best Regards,&lt;/P&gt;
&lt;P&gt;Akshay&lt;/P&gt;</description>
    <pubDate>Tue, 16 Feb 2021 13:59:34 GMT</pubDate>
    <dc:creator>AkshayBaviskar</dc:creator>
    <dc:date>2021-02-16T13:59:34Z</dc:date>
    <item>
      <title>Perf counters for measuring TLB miss rate</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Perf-counters-for-measuring-TLB-miss-rate/m-p/1256403#M7803</link>
      <description>&lt;P&gt;&lt;SPAN&gt;I want to measure following things for an application:&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI aria-level="1"&gt;&lt;SPAN&gt;TLB miss rate&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-level="1"&gt;&lt;SPAN&gt;Number of cycles spent in Page Walks&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-level="1"&gt;&lt;SPAN&gt;Runtime in number of cycles&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;SPAN&gt;I have an Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz system.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;To calculate these I am using following perf counters:&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Total number of memory references&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;( X ) = mem_inst_retired.all_loads:u + mem_inst_retired.all_stores:u&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Total number of memory references that missed in TLB&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;( Y ) =&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;SPAN&gt;mem_inst_retired.stlb_miss_loads:u + mem_inst_retired.stlb_miss_stores:u&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL start="3"&gt;
&lt;LI&gt;&lt;STRONG&gt;TLB miss rate&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;= Y/X&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Number of cycles spent in Page Walks =&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;dtlb_store_misses.walk_pending:u + dtlb_load_misses.walk_pending:u&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Runtime in number of cycles =&amp;nbsp;&lt;/STRONG&gt;cycles&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;SPAN&gt;I am confused between three parameters to count the total number of references that missed the TLB:&lt;/SPAN&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI aria-level="1"&gt;&lt;SPAN&gt;dtlb_load_misses.miss_causes_a_walk + dtlb_store_misses.miss_causes_a_walk&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-level="1"&gt;&lt;SPAN&gt;dtlb_load_misses.walk_completed + dtlb_store_misses.walk_completed&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI aria-level="1"&gt;&lt;SPAN&gt;mem_inst_retired.stlb_miss_loads + mem_inst_retired.stlb_miss_stores&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;SPAN&gt;However, when I ran the sequential array access of size 64MB. { arr[i] = i;} I am getting following values for above counters:&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;(with THP disabled)&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;dtlb_store_misses.miss_causes_a_walk = 154771&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;dtlb_store_misses.walk_completed = 116499&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;mem_inst_retired.stlb_miss_stores = 15566&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;When I double the array size to 128 MB and then to 256 MB. These counters are also getting doubled approximately. Since, 64 MB array has 16K pages, I see that&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;mem_inst_retired.stlb_miss_stores&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN&gt;is giving the closest value.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Also, I didn’t see any effect of Next-page prefetcher in this as mentioned in this post (&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://community.intel.com/t5/Software-Tuning-Performance/Inconsistency-in-TLB-miss-counters/td-p/1032560H" target="_blank" rel="noopener"&gt;&lt;SPAN&gt;https://community.intel.com/t5/Software-Tuning-Performance/Inconsistency-in-TLB-miss-counters/td-p/1...&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;). So, I suppose that my machine which has a SkyLake architecture, doesn’t have NPP.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Could you please let me know if I have chosen the right counters for my measurements?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Thanks in advance!&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;Best Regards,&lt;/P&gt;
&lt;P&gt;Akshay&lt;/P&gt;</description>
      <pubDate>Tue, 16 Feb 2021 13:59:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Perf-counters-for-measuring-TLB-miss-rate/m-p/1256403#M7803</guid>
      <dc:creator>AkshayBaviskar</dc:creator>
      <dc:date>2021-02-16T13:59:34Z</dc:date>
    </item>
    <item>
      <title>Re: Perf counters for measuring TLB miss rate</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Perf-counters-for-measuring-TLB-miss-rate/m-p/1260093#M7825</link>
      <description>&lt;P&gt;In your Y/X ratio, the count in the denominator includes only load and store requests from retired instructions (the events are described to be counted at retirement). So it makes more sense to me to use the sum of &lt;STRONG&gt;mem_inst_retired.stlb_miss_loads&lt;/STRONG&gt; + &lt;STRONG&gt;mem_inst_retired.stlb_miss_stores&lt;/STRONG&gt; to count what you've described as "Total number of memory references that missed in TLB."&lt;/P&gt;
&lt;P&gt;These events are counted together. For example, if a load retires and it missed in the STLB, the event counts of &lt;STRONG&gt;mem_inst_retired.all_loads&lt;/STRONG&gt; and &lt;STRONG&gt;mem_inst_retired.stlb_miss_loads&lt;/STRONG&gt; are incremented and, on SKL/SKX in particular, they are incremented by the same amount, which is 1.&lt;/P&gt;
&lt;P&gt;The STLB is the last level TLB on SKL/SKX. A miss in the STLB doesn't trigger a page walk if there is already an outstanding speculative walk initiated by the NPP. Also, there is a possibility a miss in the STLB doesn't trigger a walk if it happens that the walk that is about to start got cancelled by the time the miss determination is completed. Otherwise, a miss in the STLB triggers a walk.&lt;/P&gt;</description>
      <pubDate>Sun, 28 Feb 2021 20:07:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Perf-counters-for-measuring-TLB-miss-rate/m-p/1260093#M7825</guid>
      <dc:creator>HadiBrais</dc:creator>
      <dc:date>2021-02-28T20:07:34Z</dc:date>
    </item>
  </channel>
</rss>

