<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic You would need to furnish in Intel® ISA Extensions</title>
    <link>https://community.intel.com/t5/Intel-ISA-Extensions/Why-is-my-AVX-slower-than-SSE/m-p/1023318#M5019</link>
    <description>&lt;P&gt;You would need to furnish more information to define your question.&amp;nbsp; Although there is a web page with the title you quote which was once posted on Intel site, it is restricted, so the number of people who could give a partial answer based on having seen it is apparently very small.&lt;/P&gt;

&lt;P&gt;You need service packs with win7 to support AVX and HyperThreading.&lt;/P&gt;

&lt;P&gt;Are you using AVX intrinsics with Microsoft compiler, when presumably the article was about Intel compiler?&amp;nbsp; Performance of intrinsics code is likely to depend on data alignments; the Ivy Bridge CPU reduced but didn't eliminate the performance loss associated with unaligned AVX.&lt;/P&gt;

&lt;P&gt;If using Microsoft compiler, you would likely need /fp:fast, which is roughly equivalent to the Intel compiler setting /fp:source (less aggressive optimization than the article presumably expected).&amp;nbsp; I haven't looked into how the way the Microsoft compiler removes most optimization inside OpenMP regions affects AVX intrinsics code.&amp;nbsp; In cases I've seen, there is no auto-vectorization with Microsoft compilers in OpenMP parallel regions.&lt;/P&gt;

&lt;P&gt;Did you try adjusting number of threads to number of physical cores, or disabling HyperThreading, if you run on a HyperThread platform?&amp;nbsp; If it were to run OK with HyperThreading,&amp;nbsp; it might be with the help of affinity settings (which aren't supported in Microsoft's OpenMP).&lt;/P&gt;</description>
    <pubDate>Sun, 07 Jun 2015 11:11:34 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2015-06-07T11:11:34Z</dc:date>
    <item>
      <title>Why is my AVX slower than SSE?</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Why-is-my-AVX-slower-than-SSE/m-p/1023317#M5018</link>
      <description>&lt;P&gt;As the description of "IIR Gaussian Blur Filter Implementation using Intel® Advanced Vector Extensions",&lt;/P&gt;

&lt;P&gt;The&amp;nbsp;AVX should be faster than SSE,But, my result of performance measurement as following:&lt;/P&gt;

&lt;P&gt;&amp;nbsp;The computer supports AVX&lt;BR /&gt;
	number CPU in the system = 4&lt;/P&gt;

&lt;P&gt;&amp;nbsp;IIR Gaussian Filter Coefficients are:&lt;BR /&gt;
	a0 = 0.021175, a1 = -0.017807, a2 = 0.021103, a3 = -0.017875, b1 = -1.837578, b2&lt;BR /&gt;
	&amp;nbsp;= 0.844174, cprev = 0.510583, cnext = 0.489409&lt;/P&gt;

&lt;P&gt;image width = 1024, height = 1024&lt;/P&gt;

&lt;P&gt;Running multi threaded SSE code&lt;/P&gt;

&lt;P&gt;Running multi threaded AVX code&lt;/P&gt;

&lt;P&gt;SSE and AVX Implementation matches&lt;/P&gt;

&lt;P&gt;&lt;BR /&gt;
	Performance Measurement:&lt;/P&gt;

&lt;P&gt;&amp;nbsp;SSE horizontal Pass min: 4.94052 max: 109.795 avg: 6.97836&lt;BR /&gt;
	&amp;nbsp;SSE vertical Pass min: 3.32723 max: 89.6741 avg: 4.52679&lt;/P&gt;

&lt;P&gt;&amp;nbsp;AVX horizontal Pass&amp;nbsp; min: 33.0741 max: 159.732 avg: 43.4993&lt;/P&gt;

&lt;P&gt;&amp;nbsp;AVX vertical Pass min:&amp;nbsp;&amp;nbsp; 9.69314 max: 162.726 avg: 14.5814&lt;/P&gt;

&lt;P&gt;My OS is Windows7 64bit&lt;/P&gt;

&lt;P&gt;My CPU is Intel(R) Core(TM) i5-3230M CPU @ 2.6GHz 2.6GHz&lt;/P&gt;

&lt;P&gt;My IDE is VS2013, and open the option of OpenMP&lt;/P&gt;

&lt;P&gt;I want to know why is my AVX so slowly?&lt;/P&gt;

&lt;P&gt;Is there anyone can teach me how to understand it ?&lt;/P&gt;

&lt;P&gt;Thank you very much&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jun 2015 16:18:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Why-is-my-AVX-slower-than-SSE/m-p/1023317#M5018</guid>
      <dc:creator>Shaquille_W_1</dc:creator>
      <dc:date>2015-06-02T16:18:33Z</dc:date>
    </item>
    <item>
      <title>You would need to furnish</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Why-is-my-AVX-slower-than-SSE/m-p/1023318#M5019</link>
      <description>&lt;P&gt;You would need to furnish more information to define your question.&amp;nbsp; Although there is a web page with the title you quote which was once posted on Intel site, it is restricted, so the number of people who could give a partial answer based on having seen it is apparently very small.&lt;/P&gt;

&lt;P&gt;You need service packs with win7 to support AVX and HyperThreading.&lt;/P&gt;

&lt;P&gt;Are you using AVX intrinsics with Microsoft compiler, when presumably the article was about Intel compiler?&amp;nbsp; Performance of intrinsics code is likely to depend on data alignments; the Ivy Bridge CPU reduced but didn't eliminate the performance loss associated with unaligned AVX.&lt;/P&gt;

&lt;P&gt;If using Microsoft compiler, you would likely need /fp:fast, which is roughly equivalent to the Intel compiler setting /fp:source (less aggressive optimization than the article presumably expected).&amp;nbsp; I haven't looked into how the way the Microsoft compiler removes most optimization inside OpenMP regions affects AVX intrinsics code.&amp;nbsp; In cases I've seen, there is no auto-vectorization with Microsoft compilers in OpenMP parallel regions.&lt;/P&gt;

&lt;P&gt;Did you try adjusting number of threads to number of physical cores, or disabling HyperThreading, if you run on a HyperThread platform?&amp;nbsp; If it were to run OK with HyperThreading,&amp;nbsp; it might be with the help of affinity settings (which aren't supported in Microsoft's OpenMP).&lt;/P&gt;</description>
      <pubDate>Sun, 07 Jun 2015 11:11:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Why-is-my-AVX-slower-than-SSE/m-p/1023318#M5019</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2015-06-07T11:11:34Z</dc:date>
    </item>
    <item>
      <title>Based on the limited info in</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Why-is-my-AVX-slower-than-SSE/m-p/1023319#M5020</link>
      <description>&lt;P&gt;Based on the limited info in your post, the only thing I can think of that would make AVX so slow is not using VZEROUPPER everywhere it's needed.&amp;nbsp; There's a massive speed penalty for mixing SSE and AVX without VZEROUPPER.&amp;nbsp; Web search for it, or search in Agner Fog's optimization guides, to find out where you need to use it.&amp;nbsp; You can also use Intel's CPU simulator thing to detect slow transitions in your code.&lt;/P&gt;</description>
      <pubDate>Sun, 07 Jun 2015 18:13:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Why-is-my-AVX-slower-than-SSE/m-p/1023319#M5020</guid>
      <dc:creator>Peter_Cordes</dc:creator>
      <dc:date>2015-06-07T18:13:01Z</dc:date>
    </item>
    <item>
      <title>The issue about vzeroupper</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Why-is-my-AVX-slower-than-SSE/m-p/1023320#M5021</link>
      <description>&lt;P&gt;The issue about vzeroupper could arise if you are using AVX intrinsics with the Microsoft compiler, but not setting /arch:AVX.&amp;nbsp; It's just one of many guesses which might be made in the absence of adequate information.&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jun 2015 19:59:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Why-is-my-AVX-slower-than-SSE/m-p/1023320#M5021</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2015-06-10T19:59:57Z</dc:date>
    </item>
  </channel>
</rss>

