<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi John, in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/interpretation-of-dtlb-load-misses-demand-ld-walk-duration-and/m-p/1127171#M6331</link>
    <description>&lt;P&gt;Hi John,&lt;/P&gt;&lt;P&gt;The&amp;nbsp;"dtlb_load_misses.demand_ld_walk_duration" is one of the Ivy Bridge tlb events you get if you do&amp;nbsp;&lt;/P&gt;
&lt;PRE class="brush:; class-name:dark;"&gt;4x10x2 &amp;gt; perf list |grep tlb
  mem_uops_retired.stlb_miss_loads                  
  mem_uops_retired.stlb_miss_stores                 
  dtlb_load_misses.demand_ld_walk_completed         
  dtlb_load_misses.demand_ld_walk_duration      &amp;lt;&amp;lt; ====================      
  dtlb_load_misses.large_page_walk_completed        
  dtlb_load_misses.miss_causes_a_walk               
  dtlb_load_misses.stlb_hit                         
  dtlb_load_misses.walk_completed                   
  dtlb_load_misses.walk_duration                    
  dtlb_store_misses.miss_causes_a_walk              
  dtlb_store_misses.stlb_hit                        
  dtlb_store_misses.walk_completed                  
  dtlb_store_misses.walk_duration                   
  itlb.itlb_flush                                   
  itlb_misses.large_page_walk_completed             
  itlb_misses.miss_causes_a_walk                    
  itlb_misses.stlb_hit                              
  itlb_misses.walk_completed                        
  itlb_misses.walk_duration                         
  tlb_flush.dtlb_thread                             
  tlb_flush.stlb_any                                
&lt;/PRE&gt;

&lt;P&gt;To get the corresponding Intel event probably requires looking at the perf code.&lt;/P&gt;</description>
    <pubDate>Thu, 20 Feb 2020 18:12:04 GMT</pubDate>
    <dc:creator>gostanian__richard</dc:creator>
    <dc:date>2020-02-20T18:12:04Z</dc:date>
    <item>
      <title>interpretation of dtlb_load_misses.demand_ld_walk_duration and dtlb_store_misses.walk_duration</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/interpretation-of-dtlb-load-misses-demand-ld-walk-duration-and/m-p/1127169#M6329</link>
      <description>&lt;P&gt;Hi Everyone,&lt;/P&gt;&lt;P&gt;The documentation for&amp;nbsp;&lt;/P&gt;&lt;P&gt;dtlb_load_misses.demand_ld_walk_duration on Haswell says&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; [Demand load cycles page miss handler (PMH) is busy with this walk]&lt;/P&gt;&lt;P&gt;Whereas the documentation for&amp;nbsp;dtlb_store_misses.walk_duration says&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;[Cycles when PMH is busy with page walks]&lt;/P&gt;&lt;P&gt;I puzzled by the terminology "busy with this walk" vs "busy with page walks".&amp;nbsp;&lt;/P&gt;&lt;P&gt;Should they both say&amp;nbsp;"busy with page walks"?&lt;/P&gt;&lt;P&gt;So if I run&amp;nbsp;&lt;/P&gt;
&lt;PRE class="brush:; class-name:dark; wrap-lines:false;"&gt;perf stat -e cycles,instructions,dtlb_load_misses.walk_duration&lt;/PRE&gt;

&lt;P&gt;on a given command and get &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/P&gt;

&lt;PRE class="brush:; class-name:dark;"&gt; Performance counter stats for 'system wide':

   291,350,355,880      cycles                                                      
    36,361,479,212      instructions              #    0.12  insn per cycle                                            
    30,179,920,415      dtlb_load_misses.walk_duration                                   
        43,668,809      dtlb_store_misses.walk_duration                                   

      96.873898071 seconds time elapsed
&lt;/PRE&gt;

&lt;P&gt;does this mean that I'm spending&amp;nbsp;30,179,920,415 + 43,668,809 cycles out of&amp;nbsp;291,350,355,880 cycles on page table walking for dtlb misses? If so then am I spending 10% of the total time page table walking. Is this correct?&lt;/P&gt;</description>
      <pubDate>Wed, 19 Feb 2020 01:16:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/interpretation-of-dtlb-load-misses-demand-ld-walk-duration-and/m-p/1127169#M6329</guid>
      <dc:creator>gostanian__richard</dc:creator>
      <dc:date>2020-02-19T01:16:26Z</dc:date>
    </item>
    <item>
      <title>The terminology of these</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/interpretation-of-dtlb-load-misses-demand-ld-walk-duration-and/m-p/1127170#M6330</link>
      <description>&lt;P&gt;The terminology of these events can be frustrating -- it is always hard to tell if different words mean something different, or if they were just changed to add variety to the documentation....&lt;/P&gt;&lt;P&gt;I don't see an event named "dtlb_load_misses.demand_ld_walk_duration" in any Intel documentation -- where did you find that name?&lt;/P&gt;&lt;P&gt;Section 19.7 of&amp;nbsp;Volume 3 of the Intel SW Developer's Manual says that on Haswell, the event DTLB_LOAD_MISSES.WALK_DURATION (Event 0x08, Umask 0x10) measures "Cycle PMH is busy with a walk", while&amp;nbsp;the event DTLB_STORE_MISSES.WALK_DURATION (Event 0x49, Umask 0x10) measures "Cycles PMH is busy with this walk". &amp;nbsp; This may mean exactly the same thing, or it may be a way to avoid saying that the DTLB_LOAD_MISSES.WALK_DURATION might be contaminated by cycles that the PMH is executing walks on behalf of the Next-Page-Prefetcher (which was introduced in Ivy Bridge, and is the subject of almost no official documentation). &amp;nbsp; &amp;nbsp;On Haswell, my testing indicates that the event PAGE_WALKER_LOADS increments for both walks due to demand loads/stores and walks due to the next-page-prefetcher. &amp;nbsp;Differences between the sum of ITLB_MISSES, DTLB_LOAD_MISSES, and DTLB_STORE_MISSES events and the counts from PAGE_WALKER_LOADS can be used to infer the presence of next-page-prefetcher activity. &amp;nbsp; I don't know if anyone has done systematic testing, but I found that if I load data from every other 4KiB page, the number of DTLB_LOAD_MISSES is cut in half, but the total number of PAGE_WALKER_LOADS is the same (since the next-page-prefetcher loads the page translations that I skip over).&lt;/P&gt;</description>
      <pubDate>Wed, 19 Feb 2020 22:30:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/interpretation-of-dtlb-load-misses-demand-ld-walk-duration-and/m-p/1127170#M6330</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2020-02-19T22:30:44Z</dc:date>
    </item>
    <item>
      <title>Hi John,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/interpretation-of-dtlb-load-misses-demand-ld-walk-duration-and/m-p/1127171#M6331</link>
      <description>&lt;P&gt;Hi John,&lt;/P&gt;&lt;P&gt;The&amp;nbsp;"dtlb_load_misses.demand_ld_walk_duration" is one of the Ivy Bridge tlb events you get if you do&amp;nbsp;&lt;/P&gt;
&lt;PRE class="brush:; class-name:dark;"&gt;4x10x2 &amp;gt; perf list |grep tlb
  mem_uops_retired.stlb_miss_loads                  
  mem_uops_retired.stlb_miss_stores                 
  dtlb_load_misses.demand_ld_walk_completed         
  dtlb_load_misses.demand_ld_walk_duration      &amp;lt;&amp;lt; ====================      
  dtlb_load_misses.large_page_walk_completed        
  dtlb_load_misses.miss_causes_a_walk               
  dtlb_load_misses.stlb_hit                         
  dtlb_load_misses.walk_completed                   
  dtlb_load_misses.walk_duration                    
  dtlb_store_misses.miss_causes_a_walk              
  dtlb_store_misses.stlb_hit                        
  dtlb_store_misses.walk_completed                  
  dtlb_store_misses.walk_duration                   
  itlb.itlb_flush                                   
  itlb_misses.large_page_walk_completed             
  itlb_misses.miss_causes_a_walk                    
  itlb_misses.stlb_hit                              
  itlb_misses.walk_completed                        
  itlb_misses.walk_duration                         
  tlb_flush.dtlb_thread                             
  tlb_flush.stlb_any                                
&lt;/PRE&gt;

&lt;P&gt;To get the corresponding Intel event probably requires looking at the perf code.&lt;/P&gt;</description>
      <pubDate>Thu, 20 Feb 2020 18:12:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/interpretation-of-dtlb-load-misses-demand-ld-walk-duration-and/m-p/1127171#M6331</guid>
      <dc:creator>gostanian__richard</dc:creator>
      <dc:date>2020-02-20T18:12:04Z</dc:date>
    </item>
    <item>
      <title>"DTLB_LOAD_MISSES.DEMAND_LD</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/interpretation-of-dtlb-load-misses-demand-ld-walk-duration-and/m-p/1127172#M6332</link>
      <description>&lt;P&gt;"DTLB_LOAD_MISSES.DEMAND_LD_WALK_DURATION" is a name used by OProfile for Ivy Bridge, where it is listed as using Umask=0x84. &amp;nbsp;https://oprofile.sourceforge.io/docs/intel-ivybridge-events.php). &amp;nbsp; This name and Umask&amp;nbsp;is also used by the Intel documentation at&amp;nbsp;https://download.01.org/perfmon/IVT/ivytown_core_v20.json, but only for IvyTown -- not&amp;nbsp;for any other processor model. &amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Table 19-15 of Volume 3 of the SWDM says that Event 0x08, Umask 0x84 counts "cycles PMH is busy with a walk due to demand loads". &amp;nbsp; &amp;nbsp;BUT, comparing the DTLB_LOAD_MISSES (Event 0x08) encodes from Ivy Bridge (Table 19-15) and Haswell (Table 19-11) strongly suggests that the encodings for these masks have changed. &amp;nbsp;Curiously, there are no sub-events that use exactly the same Umask across these two tables, but sub-events that use very similar words have very different Umask encodings. &amp;nbsp; A change in encoding is often an indication that something important has changed in the definitions of the events -- so every variation of the event has to be re-tested against a carefully constructed set of microbenchmarks....&lt;/P&gt;&lt;P&gt;The answer to the original query ("am I&amp;nbsp;spending 10% of my time in table walking?") is probably, but not definitely, "yes".&lt;/P&gt;&lt;P&gt;The change in wording (dropping the term "demand loads" in the "duration" sub-event) remains a concern. &amp;nbsp; It should be possible to create a fairly simple set of tests that will disambiguate these issues. &amp;nbsp; I would recommend measuring all documented sub-events of DTLB_LOAD_MISSES, DTLB_STORE_MISSES, and PAGE_WALKER_LOADS against a few test patterns:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;Contiguous loads of an array mapped to 4KiB pages&lt;UL&gt;&lt;LI&gt;Small: fits in the 64 entries of the&amp;nbsp;DTLB for 4KiB pages -- e.g., 200-240KiB&lt;/LI&gt;&lt;LI&gt;Medium:&amp;nbsp;fits in the 1024 entries of the&amp;nbsp;STLB for 4KiB pages -- e.g., 500-600KiB&lt;/LI&gt;&lt;LI&gt;Large: much larger than the 1024&amp;nbsp;entries of the&amp;nbsp;STLB for 4KiB pages -- e.g., 40MiB (10x)&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;Contiguous loads of an array mapped to 2MiB&lt;UL&gt;&lt;LI&gt;Small: fits in the 32 entries of the DTB for 2MiB pages -- e.g., 32 MiB&lt;/LI&gt;&lt;LI&gt;Medium: fits in the 1024 entries of the STLB for 2MiB pages -- e.g., 512 MiB (8x larger than the DTLB range)&lt;/LI&gt;&lt;LI&gt;Large: much larger&amp;nbsp; than the 1024&amp;nbsp;entries of the&amp;nbsp;STLB for 2MiB pages -- e.g., 20GiB (10x)&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;Repeat the above tests, but read only every other 4KiB (aligned) region.&lt;UL&gt;&lt;LI&gt;Use both the original size (same number pages in the array) and twice the original size (same number of pages actually accessed)&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;Repeat the above tests, but read only the first cache line from each 4KiB (aligned) region.&lt;/LI&gt;&lt;LI&gt;Repeat the above tests, but read only the first cache line from every other 4KiB (aligned) region.&lt;UL&gt;&lt;LI&gt;Use both the original size (same number pages in the array) and twice the original size (same number of pages actually accessed)&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Although nothing ever works out quite as expected, one would hope that (compared to the number of pages accessed), the "small" cases would have a very close count&amp;nbsp;of DTLB hits, the "medium" cases would have most of the expected counts misses in the DTLB and hitting in the&amp;nbsp;STLB hit, and the "large" cases would have most counts missing both DTLB and STLB and causing walks. &amp;nbsp;The tests using every other 4KiB page should show whether the TLB lookups created by the&amp;nbsp;Next-Page-Prefetcher are included in the counts. &amp;nbsp;(I expect them in PAGE_WALKER_LOADS and not in the DTLB_LOAD_MISSES event.). &amp;nbsp;Reading only one cache line from each 4KiB page should minimize the probability that the next-page-prefetcher is activated, and reading only one cache line from every other 4KiB page should (fingers crossed) never cause the next-page-prefetcher to activate.&lt;/P&gt;&lt;P&gt;There is not much use in using performance counter event names provided by perf -- the translations may change between kernel revisions, and may mean different things on different processors. &amp;nbsp;It only takes looking up a few of these events to find errors in the events used. &amp;nbsp; The location of these events in the kernel source tree also seems to move about randomly from one kernel version to the next.&lt;/P&gt;</description>
      <pubDate>Thu, 20 Feb 2020 22:41:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/interpretation-of-dtlb-load-misses-demand-ld-walk-duration-and/m-p/1127172#M6332</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2020-02-20T22:41:04Z</dc:date>
    </item>
  </channel>
</rss>

