<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic The 10 element array was just in Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/Arbitrary-interleaver-shuffle-using-IPP/m-p/991169#M22337</link>
    <description>The 10 element array was just an example.  Actual arrays may have 1000 elements.  I believe icpc will auto-vectorize when /O3 switch is used.</description>
    <pubDate>Wed, 12 Sep 2012 17:05:20 GMT</pubDate>
    <dc:creator>egrayver</dc:creator>
    <dc:date>2012-09-12T17:05:20Z</dc:date>
    <item>
      <title>Arbitrary interleaver (shuffle) using IPP</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Arbitrary-interleaver-shuffle-using-IPP/m-p/991167#M22335</link>
      <description>&lt;P&gt;I read somewhere that the new processors include special instructions for small lookup tables.&amp;nbsp; Is there a way to optimize the following simple operation:&lt;/P&gt;
&lt;P&gt;float data[10] = {0, ...9}&lt;/P&gt;
&lt;P&gt;unsigned int idx[10] = {2,3,5,0,...9} // Arbitrary permutation of 0..9&lt;/P&gt;
&lt;P&gt;float result[10];&lt;/P&gt;
&lt;P&gt;result = data[idx]&lt;/P&gt;
&lt;P&gt;I have to do this operation often and it takes quite a bit of time in a 'for' loop. Currently&lt;BR /&gt;for (int i=0;i&amp;lt;10;i++) result&lt;I&gt;=data[idx&lt;I&gt;];&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Sep 2012 18:11:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Arbitrary-interleaver-shuffle-using-IPP/m-p/991167#M22335</guid>
      <dc:creator>egrayver</dc:creator>
      <dc:date>2012-09-11T18:11:46Z</dc:date>
    </item>
    <item>
      <title>Have yiou checked to see if</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Arbitrary-interleaver-shuffle-using-IPP/m-p/991168#M22336</link>
      <description>Have yiou checked to see if your compiler has the auto-vectorizer turned on? That will probably help you a lot. Since there are only 10 elements in the loop the overhead of threading the function may make the performance worse.</description>
      <pubDate>Tue, 11 Sep 2012 20:42:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Arbitrary-interleaver-shuffle-using-IPP/m-p/991168#M22336</guid>
      <dc:creator>Chuck_De_Sylva</dc:creator>
      <dc:date>2012-09-11T20:42:23Z</dc:date>
    </item>
    <item>
      <title>The 10 element array was just</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Arbitrary-interleaver-shuffle-using-IPP/m-p/991169#M22337</link>
      <description>The 10 element array was just an example.  Actual arrays may have 1000 elements.  I believe icpc will auto-vectorize when /O3 switch is used.</description>
      <pubDate>Wed, 12 Sep 2012 17:05:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Arbitrary-interleaver-shuffle-using-IPP/m-p/991169#M22337</guid>
      <dc:creator>egrayver</dc:creator>
      <dc:date>2012-09-12T17:05:20Z</dc:date>
    </item>
  </channel>
</rss>

