<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic ippmSub_vav_64f is slower than regular loop in Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippmSub-vav-64f-is-slower-than-regular-loop/m-p/793603#M2610</link>
    <description>Hi,&lt;DIV&gt;&lt;SPAN style="font-family: Verdana, Arial, Helvetica, sans-serif;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;Thanks for the reply.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;The vector size is DIM=3 the variable SIZE sets the number of vectors in the vector array.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;So that this setting should be adequate for IPP MX&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Snir&lt;/DIV&gt;</description>
    <pubDate>Fri, 08 Jul 2011 12:13:39 GMT</pubDate>
    <dc:creator>snirgaz</dc:creator>
    <dc:date>2011-07-08T12:13:39Z</dc:date>
    <item>
      <title>ippmSub_vav_64f is slower than regular loop</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippmSub-vav-64f-is-slower-than-regular-loop/m-p/793601#M2608</link>
      <description>Hi All,&lt;BR /&gt;&lt;BR /&gt;I am comparing ippmSub_vav_64f with a regular loop implementation. The ipp result is significantly slower.&lt;BR /&gt;&lt;BR /&gt;Compile line:&lt;BR /&gt;&lt;BR /&gt;icpc -O3 -ipp=common test.cpp&lt;BR /&gt;&lt;BR /&gt;results:&lt;BR /&gt;&lt;BR /&gt;Ipp Time (uSec):1228&lt;BR /&gt;Regular Loop Time (uSec):881&lt;BR /&gt;&lt;BR /&gt;Runs on:&lt;BR /&gt;&lt;BR /&gt;Hi All,&lt;BR /&gt;&lt;BR /&gt;I am comparing ippmSub_vav_64f with regular loop. The ipp result is significantly slower.&lt;BR /&gt;&lt;BR /&gt;Compile line:&lt;BR /&gt;&lt;BR /&gt;icpc -O3 -ipp=common test.cpp&lt;BR /&gt;&lt;BR /&gt;results:&lt;BR /&gt;&lt;BR /&gt;Ipp Time (uSec):1228&lt;BR /&gt;Regular Loop Time (uSec):881&lt;BR /&gt;&lt;BR /&gt;Runs on:&lt;BR /&gt;&lt;BR /&gt;Intel Xeon CPU X5550 @ 2.67GHz&lt;BR /&gt;&lt;BR /&gt;Code:&lt;BR /&gt;&lt;BR /&gt;#include &lt;IOSTREAM&gt;&lt;BR /&gt;#include &lt;IPP.H&gt;&lt;BR /&gt;#include &lt;MATHIMF.H&gt;&lt;BR /&gt;#include &lt;SYS&gt;&lt;BR /&gt;&lt;BR /&gt;#define VEC_SIZE 20000&lt;BR /&gt;#define DIM 3&lt;BR /&gt;#define REPEAT_SIZE 10&lt;BR /&gt;&lt;BR /&gt;int main(){&lt;BR /&gt; // Output Array&lt;BR /&gt; double *aIpp=new double[VEC_SIZE*DIM];&lt;BR /&gt; double *aLoop=new double[VEC_SIZE*DIM];&lt;BR /&gt; // Rand Arrays&lt;BR /&gt; double *temp_a=new double[VEC_SIZE*DIM];&lt;BR /&gt; double *temp_b=new double[DIM];&lt;BR /&gt; unsigned int seed=5;&lt;BR /&gt; int j,d,i;&lt;BR /&gt; int stride0=sizeof(double),stride2=VEC_SIZE*sizeof(double);&lt;BR /&gt; // Timing Vars&lt;BR /&gt; timeval startTime;&lt;BR /&gt; timeval endTime;&lt;BR /&gt; double tS,tE;&lt;BR /&gt; // Draw Arrays&lt;BR /&gt; ippsRandUniform_Direct_64f(temp_a, VEC_SIZE*DIM,0,1000,&amp;amp;seed);&lt;BR /&gt; ippsRandUniform_Direct_64f(temp_b, DIM,0,1000,&amp;amp;seed);&lt;BR /&gt; // IPP Sub&lt;BR /&gt; gettimeofday(&amp;amp;startTime, NULL);&lt;BR /&gt; for (j=0; j&lt;REPEAT_SIZE&gt;&lt;/REPEAT_SIZE&gt; ippmSub_vav_64f(temp_a, stride0, stride2, temp_b, stride0, aIpp,stride0, stride2, DIM, VEC_SIZE);&lt;BR /&gt; } &lt;BR /&gt; gettimeofday(&amp;amp;endTime, NULL);&lt;BR /&gt; tS = startTime.tv_sec*1000000 + (startTime.tv_usec);&lt;BR /&gt; tE = endTime.tv_sec*1000000 + (endTime.tv_usec);&lt;BR /&gt; std::cout&amp;lt;&amp;lt; "Ipp Time (uSec):" &amp;lt;&amp;lt; (tE-tS) &amp;lt;&amp;lt; "\\n"; &lt;BR /&gt; // Regular Sub&lt;BR /&gt; gettimeofday(&amp;amp;startTime, NULL);&lt;BR /&gt; for (j=0; j&lt;REPEAT_SIZE&gt;&lt;/REPEAT_SIZE&gt; for (d=0;d&lt;DIM&gt;&lt;/DIM&gt; for (i=0;i&lt;VEC_SIZE&gt;&lt;/VEC_SIZE&gt; aLoop[i+VEC_SIZE*d]=temp_a[i+VEC_SIZE*d]-temp_b&lt;D&gt;;&lt;BR /&gt; }&lt;BR /&gt; } &lt;BR /&gt; gettimeofday(&amp;amp;endTime, NULL);&lt;BR /&gt; tS = startTime.tv_sec*1000000 + (startTime.tv_usec);&lt;BR /&gt; tE = endTime.tv_sec*1000000 + (endTime.tv_usec);&lt;BR /&gt; std::cout&amp;lt;&amp;lt; "Regular Loop Time (uSec):" &amp;lt;&amp;lt; (tE-tS) &amp;lt;&amp;lt; "\\n";&lt;BR /&gt; for (i=0;i&lt;VEC_SIZE&gt;&lt;/VEC_SIZE&gt;  if (fabs(aLoop&lt;I&gt;-aIpp&lt;I&gt;)&amp;gt;0.0001) std::cout &amp;lt;&amp;lt;"Error";&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Any thoughts?&lt;BR /&gt;&lt;BR /&gt;Thanks !&lt;BR /&gt;&lt;BR /&gt;Snir&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/I&gt;&lt;/I&gt;&lt;/D&gt;&lt;/SYS&gt;&lt;/MATHIMF.H&gt;&lt;/IPP.H&gt;&lt;/IOSTREAM&gt;</description>
      <pubDate>Wed, 06 Jul 2011 14:39:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippmSub-vav-64f-is-slower-than-regular-loop/m-p/793601#M2608</guid>
      <dc:creator>snirgaz</dc:creator>
      <dc:date>2011-07-06T14:39:12Z</dc:date>
    </item>
    <item>
      <title>ippmSub_vav_64f is slower than regular loop</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippmSub-vav-64f-is-slower-than-regular-loop/m-p/793602#M2609</link>
      <description>&lt;P&gt;Hello, &lt;/P&gt;&lt;P&gt;It looks that you are computing the vector with size 20000. Actually, IPP MX functions are optimized for operations on small matrices and small vectors, particularly for matrices of size 3x3, 4x4, 5x5, 6x6, and for vectors of length 3, 4, 5, 6.&lt;/P&gt;&lt;P&gt;For the simple C code you test, the Compiler can easily vectorize the code, and get good performance. &lt;/P&gt;&lt;P&gt;Thanks,&lt;BR /&gt;Chao&lt;/P&gt;</description>
      <pubDate>Fri, 08 Jul 2011 07:30:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippmSub-vav-64f-is-slower-than-regular-loop/m-p/793602#M2609</guid>
      <dc:creator>Chao_Y_Intel</dc:creator>
      <dc:date>2011-07-08T07:30:05Z</dc:date>
    </item>
    <item>
      <title>ippmSub_vav_64f is slower than regular loop</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippmSub-vav-64f-is-slower-than-regular-loop/m-p/793603#M2610</link>
      <description>Hi,&lt;DIV&gt;&lt;SPAN style="font-family: Verdana, Arial, Helvetica, sans-serif;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;Thanks for the reply.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;The vector size is DIM=3 the variable SIZE sets the number of vectors in the vector array.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;So that this setting should be adequate for IPP MX&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Snir&lt;/DIV&gt;</description>
      <pubDate>Fri, 08 Jul 2011 12:13:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippmSub-vav-64f-is-slower-than-regular-loop/m-p/793603#M2610</guid>
      <dc:creator>snirgaz</dc:creator>
      <dc:date>2011-07-08T12:13:39Z</dc:date>
    </item>
    <item>
      <title>ippmSub_vav_64f is slower than regular loop</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippmSub-vav-64f-is-slower-than-regular-loop/m-p/793604#M2611</link>
      <description>Any Thoughts?&lt;BR /&gt;&lt;BR /&gt;I think that the setting fits the IPP MX target&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;&lt;BR /&gt;Snir</description>
      <pubDate>Tue, 12 Jul 2011 16:22:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippmSub-vav-64f-is-slower-than-regular-loop/m-p/793604#M2611</guid>
      <dc:creator>snirgaz</dc:creator>
      <dc:date>2011-07-12T16:22:41Z</dc:date>
    </item>
    <item>
      <title>ippmSub_vav_64f is slower than regular loop</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippmSub-vav-64f-is-slower-than-regular-loop/m-p/793605#M2612</link>
      <description>&lt;P&gt;Snir, &lt;/P&gt;&lt;P&gt;In your code, the following inner loops take most the time. It just sub a constant temp_b&lt;D&gt; from temp_a vector. For such simple code, the compiler could generated well optimized code, and achieve good performance. &lt;/D&gt;&lt;/P&gt;&lt;P&gt;for (i=0;i&lt;VEC_SIZE&gt;&lt;/VEC_SIZE&gt;&lt;/P&gt;&lt;P&gt;aLoop[i+VEC_SIZE*d]=temp_a[i+VEC_SIZE*d]-temp_b&lt;D&gt;;&lt;/D&gt;&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;Actually good replacement for such code is use the following IPP function call: &lt;/P&gt;&lt;P&gt;ippsSubC_64f(...). &lt;/P&gt;&lt;P&gt;Thanks,&lt;BR /&gt;Chao&lt;/P&gt;</description>
      <pubDate>Thu, 14 Jul 2011 01:43:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippmSub-vav-64f-is-slower-than-regular-loop/m-p/793605#M2612</guid>
      <dc:creator>Chao_Y_Intel</dc:creator>
      <dc:date>2011-07-14T01:43:58Z</dc:date>
    </item>
  </channel>
</rss>

