<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic AVX Performance Measure in Intel® ISA Extensions</title>
    <link>https://community.intel.com/t5/Intel-ISA-Extensions/AVX-Performance-Measure/m-p/790938#M406</link>
    <description>In the public SDE, the only measure (distantly) related to performance is the instruction mix count. As you saw, the time required to run the emulation has no relationship to expected hardware performance.&lt;BR /&gt;If you can show that your AVX code cuts the number of instructions required to execute the critical path by 50%, and does not increase the demand for data to/from cache beyond 16 bytes per clock nor depend on misaligned access, you have an excellent chance of significant speedup.</description>
    <pubDate>Thu, 24 Jun 2010 13:30:44 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2010-06-24T13:30:44Z</dc:date>
    <item>
      <title>AVX Performance Measure</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/AVX-Performance-Measure/m-p/790937#M405</link>
      <description>&lt;P&gt;Hi All, &lt;BR /&gt; A few articles about the performance gain with the use of AVX over other SIMD instructions have been shared in the site (For example, Wiener Filtering Using Intel Advanced Vector Extensions by Mr Kit Chung). The performance gain when comparing the 128 bit SSE and 256 bit AVX has also been provided (I pasted them from your site). Could anyone please tell me how the performance gain can be measured on the SDE?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Intel AVX (256-bit) Intel SSE (128-bit) AVX vs.SSE&lt;/P&gt;&lt;P&gt;Wiener filter &lt;BR /&gt;&lt;BR /&gt;45871 669331.46x&lt;/P&gt;&lt;P&gt;Wiener filter with grouped arrays&lt;/P&gt;&lt;P&gt;42464644731.51x&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I made several calls to the sse code and then the same number of calls to the AVX code on the sde. The ratio is showing a degradaion in performance of the avx version in comparison to the sse code.Below are the results I have obtained when I ran the two functions 1000 times.&lt;BR /&gt;&lt;BR /&gt;intrin_wiener_rcp_sse = 0.284260 msec&lt;BR /&gt;intrin_wiener_rcp_avx = 15.032977 msec&lt;BR /&gt;Performance Improvement is 0.018909 times&lt;BR /&gt;&lt;BR /&gt;How can I check the performance? Can you please help.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 24 Jun 2010 12:48:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/AVX-Performance-Measure/m-p/790937#M405</guid>
      <dc:creator>inteleverywhere</dc:creator>
      <dc:date>2010-06-24T12:48:21Z</dc:date>
    </item>
    <item>
      <title>AVX Performance Measure</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/AVX-Performance-Measure/m-p/790938#M406</link>
      <description>In the public SDE, the only measure (distantly) related to performance is the instruction mix count. As you saw, the time required to run the emulation has no relationship to expected hardware performance.&lt;BR /&gt;If you can show that your AVX code cuts the number of instructions required to execute the critical path by 50%, and does not increase the demand for data to/from cache beyond 16 bytes per clock nor depend on misaligned access, you have an excellent chance of significant speedup.</description>
      <pubDate>Thu, 24 Jun 2010 13:30:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/AVX-Performance-Measure/m-p/790938#M406</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2010-06-24T13:30:44Z</dc:date>
    </item>
  </channel>
</rss>

