<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Also note that Clang has two in Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Performance-Issue/m-p/1141421#M26103</link>
    <description>&lt;P&gt;Also note that Clang has two built-in vectorizers&amp;nbsp;&lt;A href="https://www.llvm.org/docs/Vectorizers.html"&gt;https://www.llvm.org/docs/Vectorizers.html&lt;/A&gt;, that for comparison you can&amp;nbsp;put on and off.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Adriaan van Os&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 17 Mar 2020 11:34:59 GMT</pubDate>
    <dc:creator>Adriaan_van_Os</dc:creator>
    <dc:date>2020-03-17T11:34:59Z</dc:date>
    <item>
      <title>IPP - Performance Issue</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Performance-Issue/m-p/1141417#M26099</link>
      <description>&lt;P&gt;HI,&lt;/P&gt;&lt;P&gt;I'm using IPP dynamic linked with clang 11.0.0.&lt;/P&gt;&lt;P&gt;Hardware:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&amp;nbsp; Processor Name: 6-Core Intel Core i5&lt;/LI&gt;&lt;LI&gt;&amp;nbsp; Processor Speed: 3 GHz&lt;/LI&gt;&lt;LI&gt;&amp;nbsp; Number of Processors: 1&lt;/LI&gt;&lt;LI&gt;&amp;nbsp; Total Number of Cores: 6&lt;/LI&gt;&lt;LI&gt;&amp;nbsp; L2 Cache (per Core): 256 KB&lt;/LI&gt;&lt;LI&gt;&amp;nbsp; L3 Cache: 9 MB&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;I might be missing something but should this be slower than with simple std:: functions?&amp;nbsp;&lt;/P&gt;&lt;P&gt;If yes, how can I make it faster?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE class="brush:cpp; class-name:dark;"&gt;struct Vect3DArray
{
    Ipp64f* x_;
    Ipp64f* y_;
    Ipp64f* z_;

    Vect3DArray(int size)
    {
        x_ = ippsMalloc_64f(size * sizeof(Ipp64f));
        y_ = ippsMalloc_64f(size * sizeof(Ipp64f));
        z_ = ippsMalloc_64f(size * sizeof(Ipp64f));
    }

    ~Vect3DArray() { ippFree(x_); ippFree(y_); ippFree(z_); }
};

int main() {
    Vect3DArray vectArray(kAmount);
    Vect3DArray dstVectArray(kAmount);
    Ipp64f* sums = ippsMalloc_64f(kAmount * sizeof(Ipp64f));
    for (std::size_t i = 1; i &amp;lt; kAmount; ++i) {
        vectArray.x_&lt;I&gt; = i * 2.5;
        vectArray.y_&lt;I&gt; = i * 3.3;
        vectArray.z_&lt;I&gt; = i * 4.7;
    }

    auto start = std::chrono::high_resolution_clock::now();

    ippsMul_64f(vectArray.x_, vectArray.x_, dstVectArray.x_, static_cast&amp;lt;int&amp;gt;(kAmount));
    ippsMul_64f(vectArray.y_, vectArray.y_, dstVectArray.y_, static_cast&amp;lt;int&amp;gt;(kAmount));
    ippsMul_64f(vectArray.z_, vectArray.z_, dstVectArray.z_, static_cast&amp;lt;int&amp;gt;(kAmount));

    ippsAdd_64f(dstVectArray.x_, dstVectArray.y_, sums, kAmount);
    ippsAdd_64f(sums, vectArray.z_, sums, kAmount);
    ippsSqr_64f_I(sums, kAmount);

    ippsDiv_64f_I(sums, vectArray.x_, kAmount);
    ippsDiv_64f_I(sums, vectArray.y_, kAmount);
    ippsDiv_64f_I(sums, vectArray.z_, kAmount);

    auto end = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast&amp;lt;std::chrono::microseconds&amp;gt;(end - start).count();
    std::cout &amp;lt;&amp;lt; "#" &amp;lt;&amp;lt; duration &amp;lt;&amp;lt; std::endl;
}&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Mar 2020 18:48:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Performance-Issue/m-p/1141417#M26099</guid>
      <dc:creator>AF</dc:creator>
      <dc:date>2020-03-13T18:48:57Z</dc:date>
    </item>
    <item>
      <title>What is the value of kAmount</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Performance-Issue/m-p/1141418#M26100</link>
      <description>&lt;P&gt;What is the value of kAmount ? &amp;nbsp;Do the vectors fit in L1 cache ? If not, try to&amp;nbsp;do the various operations on chunks that do fit in L1 cache rather than on the whole vector.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Adriaan van &amp;nbsp;Os&lt;/P&gt;</description>
      <pubDate>Mon, 16 Mar 2020 14:11:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Performance-Issue/m-p/1141418#M26100</guid>
      <dc:creator>Adriaan_van_Os</dc:creator>
      <dc:date>2020-03-16T14:11:57Z</dc:date>
    </item>
    <item>
      <title>Hi, thanks for the reply.</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Performance-Issue/m-p/1141419#M26101</link>
      <description>&lt;P&gt;Hi, thanks for the reply.&lt;/P&gt;&lt;P&gt;The vectors don't fit L1, as kAmount was 12000 in my tests.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What kind of improvement can I look to obtain in the best case scenario?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Alexandre F.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Mar 2020 10:10:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Performance-Issue/m-p/1141419#M26101</guid>
      <dc:creator>AF</dc:creator>
      <dc:date>2020-03-17T10:10:01Z</dc:date>
    </item>
    <item>
      <title>Well, that depends on a lot</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Performance-Issue/m-p/1141420#M26102</link>
      <description>&lt;P&gt;Well, that depends on a lot of factors, most important cache usage. And it could be that the system (CPU) is doing some background work, etcetera.&lt;/P&gt;&lt;P&gt;Note that the first call to IPP may be "very slow" (like 1 millisecond)&amp;nbsp;due to library initialization. So, keep that call out of the timing.&lt;/P&gt;&lt;P&gt;Based on limited tests I did, the speed improvement with Float32 is typically 3x (that number is probably better on a CPU with bigger vector registers, like AVX-512). With Float64 the speed improvement is disappointing (typically up to 50% or at most 100%). I my limited tests, some ipps Float64 functions were&amp;nbsp;slower than their vDSP &lt;A href="https://developer.apple.com/documentation/accelerate/vdsp?language=objc"&gt;https://developer.apple.com/documentation/accelerate/vdsp?language=objc&lt;/A&gt;&amp;nbsp;counterparts. Again, that may be better on a&amp;nbsp;CPU with bigger vector registers, like AVX-512.&lt;/P&gt;&lt;P&gt;In &amp;nbsp;my opinion, with Float64, it pays more to make your code threaded (I mean explicit with pthreads, not semi-automatic with OpenMP). But then it depends how stupid (sorry) the thread synchronisation is. Use "lock-free" synchronisation, never critical sections, they spoil everything.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Adriaan van Os&lt;/P&gt;</description>
      <pubDate>Tue, 17 Mar 2020 11:24:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Performance-Issue/m-p/1141420#M26102</guid>
      <dc:creator>Adriaan_van_Os</dc:creator>
      <dc:date>2020-03-17T11:24:03Z</dc:date>
    </item>
    <item>
      <title>Also note that Clang has two</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Performance-Issue/m-p/1141421#M26103</link>
      <description>&lt;P&gt;Also note that Clang has two built-in vectorizers&amp;nbsp;&lt;A href="https://www.llvm.org/docs/Vectorizers.html"&gt;https://www.llvm.org/docs/Vectorizers.html&lt;/A&gt;, that for comparison you can&amp;nbsp;put on and off.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Adriaan van Os&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Mar 2020 11:34:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/IPP-Performance-Issue/m-p/1141421#M26103</guid>
      <dc:creator>Adriaan_van_Os</dc:creator>
      <dc:date>2020-03-17T11:34:59Z</dc:date>
    </item>
  </channel>
</rss>

