<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic MEM_UOP_RETIRED. LOADS &amp; MEM_UOP_RETIRED. STORES in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/MEM-UOP-RETIRED-LOADS-MEM-UOP-RETIRED-STORES/m-p/783061#M406</link>
    <description>Hello GHui,&lt;BR /&gt;You need to use the umask '0x81' for loads and 0x82 for stores.&lt;BR /&gt;From the SDM vol 3:&lt;BR /&gt;&lt;P&gt;evt=D0H, umask=80H, MEM_UOP_RETIRED.ALL ; Qualify any retired memory uops.; Must combine with umask 01H, 02H, to produce counts.&lt;BR /&gt;&lt;BR /&gt;So for loads use '-r81d0' and for stores use -r82d0.&lt;BR /&gt;&lt;BR /&gt;The events just countloads and stores uops, not necessarily loads and stores that go all the way to DRAM.&lt;BR /&gt;And, probably unless your code is hand coded assembly (where you KNOW that all the loads go to DRAM, no register spills, reloading of registers etc) then probably most loads and stores don't go memory(DRAM).&lt;BR /&gt;Pat&lt;/P&gt;</description>
    <pubDate>Thu, 29 Mar 2012 16:58:04 GMT</pubDate>
    <dc:creator>Patrick_F_Intel1</dc:creator>
    <dc:date>2012-03-29T16:58:04Z</dc:date>
    <item>
      <title>MEM_UOP_RETIRED. LOADS &amp; MEM_UOP_RETIRED. STORES</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/MEM-UOP-RETIRED-LOADS-MEM-UOP-RETIRED-STORES/m-p/783059#M404</link>
      <description>I want to collect MEM_UOP_RETIRED. LOADS and MEM_UOP_RETIRED. STORES events. I used "perf stat -e r01D0 -e r02D0 ./stream" to test. But I get event's value is zero. I dont't known how this happened.&lt;BR /&gt;&lt;BR /&gt;OS: rhel6.1&lt;BR /&gt;HardPlatform: vendor_id: GenuineIntelcpu family: 6model: 45model name: Intel Xeon CPU E5-2680 0 @ 2.70GHzstepping: 6cpuid level: 13&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;------------------------------------------------------------------------------------&lt;BR /&gt;EventNum. | UmaskValue |   EventMaskMnemonic   | Description   | Comment&lt;BR /&gt;------------------------------------------------------------------------------------&lt;BR /&gt;D0H     | 01H |MEM_UOP_RETIRED.LOADS | Qualify retired memory uops that are loads. Combine with umask 10H,20H, 40H, 80H. |  Supports PEBS&lt;BR /&gt;------------------------------------------------------------------------------------&lt;BR /&gt;D0H | 02H |MEM_UOP_RETIRED.STORES | Qualify retired memory uops that are stores. Combine with umask 10H, 20H, 40H, 80H. | &lt;BR /&gt;------------------------------------------------------------------------------------&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I get the doucument from &lt;A href="http://download.intel.com/products/processor/manual/325462.pdf" target="_blank"&gt;http://download.intel.com/products/processor/manual/325462.pdf&lt;/A&gt;, at page 3121/4128.&lt;BR /&gt;</description>
      <pubDate>Wed, 28 Mar 2012 09:12:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/MEM-UOP-RETIRED-LOADS-MEM-UOP-RETIRED-STORES/m-p/783059#M404</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2012-03-28T09:12:42Z</dc:date>
    </item>
    <item>
      <title>MEM_UOP_RETIRED. LOADS &amp; MEM_UOP_RETIRED. STORES</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/MEM-UOP-RETIRED-LOADS-MEM-UOP-RETIRED-STORES/m-p/783060#M405</link>
      <description>And may I consider the two events (MEM_UOP_RETIRED. LOADS &amp;amp; MEM_UOP_RETIRED. STORES) as memory bandwidth?&lt;BR /&gt;</description>
      <pubDate>Thu, 29 Mar 2012 16:05:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/MEM-UOP-RETIRED-LOADS-MEM-UOP-RETIRED-STORES/m-p/783060#M405</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2012-03-29T16:05:03Z</dc:date>
    </item>
    <item>
      <title>MEM_UOP_RETIRED. LOADS &amp; MEM_UOP_RETIRED. STORES</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/MEM-UOP-RETIRED-LOADS-MEM-UOP-RETIRED-STORES/m-p/783061#M406</link>
      <description>Hello GHui,&lt;BR /&gt;You need to use the umask '0x81' for loads and 0x82 for stores.&lt;BR /&gt;From the SDM vol 3:&lt;BR /&gt;&lt;P&gt;evt=D0H, umask=80H, MEM_UOP_RETIRED.ALL ; Qualify any retired memory uops.; Must combine with umask 01H, 02H, to produce counts.&lt;BR /&gt;&lt;BR /&gt;So for loads use '-r81d0' and for stores use -r82d0.&lt;BR /&gt;&lt;BR /&gt;The events just countloads and stores uops, not necessarily loads and stores that go all the way to DRAM.&lt;BR /&gt;And, probably unless your code is hand coded assembly (where you KNOW that all the loads go to DRAM, no register spills, reloading of registers etc) then probably most loads and stores don't go memory(DRAM).&lt;BR /&gt;Pat&lt;/P&gt;</description>
      <pubDate>Thu, 29 Mar 2012 16:58:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/MEM-UOP-RETIRED-LOADS-MEM-UOP-RETIRED-STORES/m-p/783061#M406</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2012-03-29T16:58:04Z</dc:date>
    </item>
    <item>
      <title>MEM_UOP_RETIRED. LOADS &amp; MEM_UOP_RETIRED. STORES</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/MEM-UOP-RETIRED-LOADS-MEM-UOP-RETIRED-STORES/m-p/783062#M407</link>
      <description>Yes, it worked. Thank you.&lt;BR /&gt;&lt;BR /&gt;(240257946213*64*10e-9 + 120170251833*64*10e-9)/165.653999299 = 139.250514763052&lt;BR /&gt;&lt;BR /&gt;I calc the MemoryBandwidt is 139GB/sec, is that correct according to the stream log? &lt;BR /&gt;&lt;BR /&gt;The output log follow:&lt;BR /&gt;----------------------------&lt;BR /&gt;Function Rate (MB/s) Avg time Min time Max time&lt;BR /&gt;Copy: 14224.0068 0.0068 0.0067 0.0071&lt;BR /&gt;Scale: 13815.0410 0.0070 0.0069 0.0073&lt;BR /&gt;Add: 15270.5243 0.0095 0.0094 0.0099&lt;BR /&gt;Triad: 14748.1204 0.0098 0.0098 0.0101&lt;BR /&gt;-------------------------------------------------------------&lt;BR /&gt;Solution Validates&lt;BR /&gt;-------------------------------------------------------------&lt;BR /&gt;&lt;BR /&gt;Performance counter stats for './stream':&lt;BR /&gt;&lt;BR /&gt; 240257946213 raw 0x81d0&lt;BR /&gt; 120170251833 raw 0x82d0&lt;BR /&gt;&lt;BR /&gt; 165.653999299 seconds time elapsed&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 30 Mar 2012 07:07:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/MEM-UOP-RETIRED-LOADS-MEM-UOP-RETIRED-STORES/m-p/783062#M407</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2012-03-30T07:07:04Z</dc:date>
    </item>
    <item>
      <title>MEM_UOP_RETIRED. LOADS &amp; MEM_UOP_RETIRED. STORES</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/MEM-UOP-RETIRED-LOADS-MEM-UOP-RETIRED-STORES/m-p/783063#M408</link>
      <description>I also tested Linpak.&lt;BR /&gt;&lt;BR /&gt;(3007370989069*64*10e-9 + 96836841041*64*10e-9)/143.245665508 = 1386.913177599393&lt;BR /&gt;&lt;BR /&gt;The result is so big that I think there is something wrong with my formula.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;[cpp]The linpak log following:
----------
[ user ]$ perf  stat -e r81d0 -e r82d0 ./linpack

   ... ...
   ... ...

    3007370989069  raw 0x81d0
    96836841041  raw 0x82d0

  143.245665508  seconds time elapsed[/cpp]</description>
      <pubDate>Fri, 30 Mar 2012 09:28:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/MEM-UOP-RETIRED-LOADS-MEM-UOP-RETIRED-STORES/m-p/783063#M408</guid>
      <dc:creator>GHui</dc:creator>
      <dc:date>2012-03-30T09:28:24Z</dc:date>
    </item>
    <item>
      <title>MEM_UOP_RETIRED. LOADS &amp; MEM_UOP_RETIRED. STORES</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/MEM-UOP-RETIRED-LOADS-MEM-UOP-RETIRED-STORES/m-p/783064#M409</link>
      <description>The only way you can use these events to count memory bandwidth is if you are SURE that every load and store uop actually misses the L3.&lt;BR /&gt;Your results indicate that most of the load and store uops are hitting the cache.&lt;BR /&gt;Pat</description>
      <pubDate>Fri, 30 Mar 2012 15:10:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/MEM-UOP-RETIRED-LOADS-MEM-UOP-RETIRED-STORES/m-p/783064#M409</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2012-03-30T15:10:08Z</dc:date>
    </item>
  </channel>
</rss>

