<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Write Combining Buffer in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/Write-Combining-Buffer/m-p/822178#M1007</link>
    <description>Hello Srinath,&lt;BR /&gt;I don't think there is any way do what you want to do.&lt;BR /&gt;On older processors there was a partial wc buffer write event (I think) but this event doesn't exist on current processors. And the event didn't tell how many bytes were written, just the number of times a partial write occurred.&lt;BR /&gt;Sorry,&lt;BR /&gt;Pat&lt;BR /&gt;</description>
    <pubDate>Wed, 16 May 2012 02:16:29 GMT</pubDate>
    <dc:creator>Patrick_F_Intel1</dc:creator>
    <dc:date>2012-05-16T02:16:29Z</dc:date>
    <item>
      <title>Write Combining Buffer</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Write-Combining-Buffer/m-p/822177#M1006</link>
      <description>Hi,&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;I am trying to leverage the write combining buffer in X86 processors to perform some memory and io optimizations.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;My question is,&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;is it possible for me probe into the write combining buffer and know exactly how many bytes are getting evicted out ? Are there any hacks/performance counters that can give me this information ?&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Thanks&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Srinath&lt;/DIV&gt;</description>
      <pubDate>Tue, 15 May 2012 23:44:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Write-Combining-Buffer/m-p/822177#M1006</guid>
      <dc:creator>Srinath_Sridharan</dc:creator>
      <dc:date>2012-05-15T23:44:00Z</dc:date>
    </item>
    <item>
      <title>Write Combining Buffer</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Write-Combining-Buffer/m-p/822178#M1007</link>
      <description>Hello Srinath,&lt;BR /&gt;I don't think there is any way do what you want to do.&lt;BR /&gt;On older processors there was a partial wc buffer write event (I think) but this event doesn't exist on current processors. And the event didn't tell how many bytes were written, just the number of times a partial write occurred.&lt;BR /&gt;Sorry,&lt;BR /&gt;Pat&lt;BR /&gt;</description>
      <pubDate>Wed, 16 May 2012 02:16:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Write-Combining-Buffer/m-p/822178#M1007</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2012-05-16T02:16:29Z</dc:date>
    </item>
    <item>
      <title>Write Combining Buffer</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Write-Combining-Buffer/m-p/822179#M1008</link>
      <description>Thanks Patrick. Thanks for the quick response.&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Is there a way to retrive data addresses (virtual or physical) accessed by loads and stores (I am specifically looking for non-temporal stores) ? I was initially using PIN to do that and the performance was terrible. Is there a way to get that info from the hardware directly, say at the time of instruction retirement or something ?&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Thanks&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Srinath&lt;/DIV&gt;</description>
      <pubDate>Wed, 16 May 2012 04:14:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Write-Combining-Buffer/m-p/822179#M1008</guid>
      <dc:creator>Srinath_Sridharan</dc:creator>
      <dc:date>2012-05-16T04:14:01Z</dc:date>
    </item>
    <item>
      <title>Write Combining Buffer</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Write-Combining-Buffer/m-p/822180#M1009</link>
      <description>Hey&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;I actually have one more question. Is there any information that I can get from write combining buffers on modern processors ?&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Srinath&lt;/DIV&gt;</description>
      <pubDate>Wed, 16 May 2012 17:11:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Write-Combining-Buffer/m-p/822180#M1009</guid>
      <dc:creator>Srinath_Sridharan</dc:creator>
      <dc:date>2012-05-16T17:11:46Z</dc:date>
    </item>
    <item>
      <title>Write Combining Buffer</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Write-Combining-Buffer/m-p/822181#M1010</link>
      <description>Only by looking it up under the current equivalent name "fill buffer," and not a great deal.</description>
      <pubDate>Wed, 16 May 2012 18:16:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Write-Combining-Buffer/m-p/822181#M1010</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2012-05-16T18:16:31Z</dc:date>
    </item>
    <item>
      <title>Write Combining Buffer</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Write-Combining-Buffer/m-p/822182#M1011</link>
      <description>Let me see if I understand your questions correctly -- are you trying to use non-temporal stores but you are not seeing the improvement?&lt;BR /&gt;</description>
      <pubDate>Fri, 18 May 2012 00:00:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Write-Combining-Buffer/m-p/822182#M1011</guid>
      <dc:creator>levicki</dc:creator>
      <dc:date>2012-05-18T00:00:24Z</dc:date>
    </item>
    <item>
      <title>Write Combining Buffer</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Write-Combining-Buffer/m-p/822183#M1012</link>
      <description>My orignial question is as follows:&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Is there any meta information (possibly through hardware counters) that I can retrieve from Write Combing Buffers (WCB) (I mmap-ed by address space to WC mode to bypass cache) ?&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;My question was not about performance. I am trying to save some cache space by bypassing writes that don't need any locality, directly into a storage device. Similar to bypassing framebuffers into the graphics device.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;One of the problems I am facing is that, I need to provide some correctness guarantees. I need to verify if the cache-lines flushed from the WCBs have all reached the destination. But to do that I need some information from the processor side such as&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;1) physical address of cache lines flushed from WCB&lt;/DIV&gt;&lt;DIV&gt;2) Number of bytes&lt;/DIV&gt;&lt;DIV&gt;3) Atlleast number of partial/full lines flushed&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Any combination of 1), 2) or 3) is fine.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Is there a way for me retrieve such information somewhere from the processor ?&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Srinath&lt;/DIV&gt;</description>
      <pubDate>Fri, 18 May 2012 18:10:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Write-Combining-Buffer/m-p/822183#M1012</guid>
      <dc:creator>Srinath_Sridharan</dc:creator>
      <dc:date>2012-05-18T18:10:36Z</dc:date>
    </item>
    <item>
      <title>Write Combining Buffer</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Write-Combining-Buffer/m-p/822184#M1013</link>
      <description>As you probably know, non-temporal load/store instructions (movntps, movntdq, etc) are a way to bypass the cache on read and write.&lt;BR /&gt;&lt;BR /&gt;Since non-temporal stores are weakly ordered before using the data you need to issue mfence/lfence/sfence (depending on what you are doing with the data, most likely sfence in your case).&lt;BR /&gt;&lt;BR /&gt;Those fencing instructions are the only guarantee that the data has reached the destination before you use it.&lt;BR /&gt;&lt;BR /&gt;As far as I know, the metrics that you are looking for are not available as CPU counters -- perhaps they can be observed through ITP debugging but I am not sure about that, and such hardware is extremely expensive anyway.&lt;BR /&gt;</description>
      <pubDate>Sat, 19 May 2012 12:30:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Write-Combining-Buffer/m-p/822184#M1013</guid>
      <dc:creator>levicki</dc:creator>
      <dc:date>2012-05-19T12:30:30Z</dc:date>
    </item>
  </channel>
</rss>

