<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic TLB misses in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/TLB-misses/m-p/795358#M598</link>
    <description>Hello,&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;I'm trying to measure TLB misses with the following counters:&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;DTLB_LOAD_MISSES.ANY&lt;/DIV&gt;&lt;DIV&gt;MEM_LOAD_RETIRED.DTLB_MISS&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;The second one gives more misses than the first one. And also the first one gives more misses (approximately 2 times) than the expected misses. What can be the possible reasons? Is the first one counting 2 times per miss for first level miss and second level miss? The machine I'm using is Xeon L5520. Any help is appreciated.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Cheers,&lt;/DIV&gt;</description>
    <pubDate>Wed, 07 Mar 2012 11:30:52 GMT</pubDate>
    <dc:creator>cagribal</dc:creator>
    <dc:date>2012-03-07T11:30:52Z</dc:date>
    <item>
      <title>TLB misses</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/TLB-misses/m-p/795358#M598</link>
      <description>Hello,&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;I'm trying to measure TLB misses with the following counters:&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;DTLB_LOAD_MISSES.ANY&lt;/DIV&gt;&lt;DIV&gt;MEM_LOAD_RETIRED.DTLB_MISS&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;The second one gives more misses than the first one. And also the first one gives more misses (approximately 2 times) than the expected misses. What can be the possible reasons? Is the first one counting 2 times per miss for first level miss and second level miss? The machine I'm using is Xeon L5520. Any help is appreciated.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Cheers,&lt;/DIV&gt;</description>
      <pubDate>Wed, 07 Mar 2012 11:30:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/TLB-misses/m-p/795358#M598</guid>
      <dc:creator>cagribal</dc:creator>
      <dc:date>2012-03-07T11:30:52Z</dc:date>
    </item>
    <item>
      <title>TLB misses</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/TLB-misses/m-p/795359#M599</link>
      <description>Hello cagribal,&lt;BR /&gt;I ran a test to check the counters.&lt;BR /&gt;The test is a 'read memory bandwidth' test.&lt;BR /&gt;I start 1 thread/cpu and each thread reads a 40MB array using a 64 byte stride for 10 seconds.&lt;BR /&gt;I would expect 1 DTLB miss per page. Each page is 4096 bytes. &lt;BR /&gt;It takes 64 loads to cover a page (64 loads = 4096 page size/64 stride).&lt;BR /&gt;&lt;BR /&gt;Here is what I counted in 1 of the 10 seconds.&lt;BR /&gt;&lt;BR /&gt;[cpp]DTLB_LOAD_MISSES.ANY	556,938	556,991	560,425	556,418
MEM_LOAD_RETIRED.DTLB_MISS	532,471	513,618	524,524	526,461
MEM_INST_RETIRED.LOADS	37,694,658	38,354,887	36,165,843	34,506,563
UNC_LLC_LINES_IN.ANY	133,367,850			
DTLB_MISSES.WALK_COMPLETED	558,674	558,890	565,344	559,734
				
a. loads/DTLB_miss, row3/row1	67.68	68.86	64.53	62.02
b. loads/DTLB_miss, row3/row2	70.79	74.68	68.95	65.54
c. loads/DTLB_miss, row3/row5	67.47	68.63	63.97	61.65
d. LLC_misses/DTLB_miss, row4/sum(row2)	63.60			
e. loads/LLC_miss, sum(row3)/row4	1.10			
[/cpp]&lt;BR /&gt;The raw data is in rows 1-5.&lt;BR /&gt;I compute how many loads/DTLB_miss in rows a-d.&lt;BR /&gt;The loads/dtlb_miss is close to the expected 64. I ran the test on my work laptop which has tons of stuff running on it. &lt;BR /&gt;Row d. shows the LLC (Last level cache) misses / dtlb_miss. This is very close to 64 and is probably the best measure (since most of the LLC misses are due to my read memory bw test case).&lt;BR /&gt;&lt;BR /&gt;So... in conclusion... I don't see overcounting. Certainly not 2x times too many DTLB misses.&lt;BR /&gt;Can you tell us more about your expected count and methodology?&lt;BR /&gt;&lt;BR /&gt;Pat</description>
      <pubDate>Thu, 08 Mar 2012 15:49:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/TLB-misses/m-p/795359#M599</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2012-03-08T15:49:41Z</dc:date>
    </item>
    <item>
      <title>TLB misses</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/TLB-misses/m-p/795360#M600</link>
      <description>Hello Pat,&lt;BR /&gt;&lt;BR /&gt;The test confused me. There is much I can't understand. &lt;BR /&gt;&lt;BR /&gt;[cpp]a. loads/DTLB_miss, row3/row1   67.68   68.86   64.53   62.02  
b. loads/DTLB_miss, row3/row2   70.79   74.68   68.95   65.54  
c. loads/DTLB_miss, row3/row5   67.47   68.63   63.97   61.65  [/cpp] &lt;BR /&gt;The first column is the same, but the second column is different. I don't know why is this.&lt;BR /&gt;&lt;BR /&gt;And what's the test's program? could I get the code? I want to do it by myself.&lt;BR /&gt;&lt;BR /&gt;GHui</description>
      <pubDate>Sat, 10 Mar 2012 00:52:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/TLB-misses/m-p/795360#M600</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2012-03-10T00:52:17Z</dc:date>
    </item>
    <item>
      <title>TLB misses</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/TLB-misses/m-p/795361#M601</link>
      <description>Hello GHui,&lt;BR /&gt;Yeah, the table is not so clear.&lt;BR /&gt;Here is what it should look like:&lt;BR /&gt;[cpp]                                core0   core1   core2   core3
a. loads/DTLB_miss(row3/row1)   67.68   68.86   64.53   62.02  
b. loads/DTLB_miss(row3/row2)   70.79   74.68   68.95   65.54  
c. loads/DTLB_miss(row3/row5)   67.47   68.63   63.97   61.65  [/cpp]&lt;BR /&gt;So all 3 rows are "loads/DTLB_misses" but computed from different quantities.&lt;BR /&gt;&lt;BR /&gt;The test program is my 'id_cpu' utility. I don't have approval to release it.&lt;BR /&gt;But it should be relatively easy to reproduce the results with any 64 byte stride (justtouch eachcache line), read memory, with a 40 MB array.&lt;BR /&gt;Pat</description>
      <pubDate>Sun, 11 Mar 2012 20:46:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/TLB-misses/m-p/795361#M601</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2012-03-11T20:46:18Z</dc:date>
    </item>
  </channel>
</rss>

