<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: ippiScaleC_32s32f_C1R is slower than simple loop in Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippiScaleC-32s32f-C1R-is-slower-than-simple-loop/m-p/1194595#M27378</link>
    <description>&lt;P&gt;ippi.dll shows File version 2020.0.2.1083&lt;/P&gt;</description>
    <pubDate>Sat, 25 Jul 2020 06:02:47 GMT</pubDate>
    <dc:creator>andreypir</dc:creator>
    <dc:date>2020-07-25T06:02:47Z</dc:date>
    <item>
      <title>ippiScaleC_32s32f_C1R is slower than simple loop</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippiScaleC-32s32f-C1R-is-slower-than-simple-loop/m-p/1194573#M27373</link>
      <description>&lt;P&gt;Attached is a simple console project that shows that when scaling a matrix&amp;nbsp;ippiScaleC_32s32f_C1R is slower than a simple equivalent C++ code loop. In this example a column from the source matrix is scaled into an output vector. On my PC with i7-7700K the C++ loop is about 20% faster.&lt;/P&gt;
&lt;P&gt;Is there any way to improve the&amp;nbsp;ippiScaleC performance?&lt;/P&gt;</description>
      <pubDate>Sat, 25 Jul 2020 00:52:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippiScaleC-32s32f-C1R-is-slower-than-simple-loop/m-p/1194573#M27373</guid>
      <dc:creator>andreypir</dc:creator>
      <dc:date>2020-07-25T00:52:46Z</dc:date>
    </item>
    <item>
      <title>Re: ippiScaleC_32s32f_C1R is slower than simple loop</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippiScaleC-32s32f-C1R-is-slower-than-simple-loop/m-p/1194578#M27374</link>
      <description>&lt;P&gt;Andrey, how could we check this case? There is no reproducer attached to this thread.&lt;/P&gt;</description>
      <pubDate>Sat, 25 Jul 2020 02:27:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippiScaleC-32s32f-C1R-is-slower-than-simple-loop/m-p/1194578#M27374</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-07-25T02:27:02Z</dc:date>
    </item>
    <item>
      <title>Re: ippiScaleC_32s32f_C1R is slower than simple loop</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippiScaleC-32s32f-C1R-is-slower-than-simple-loop/m-p/1194580#M27375</link>
      <description>&lt;P&gt;Sorry I thought I attached the zip. Here it is.&lt;/P&gt;</description>
      <pubDate>Sat, 25 Jul 2020 03:11:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippiScaleC-32s32f-C1R-is-slower-than-simple-loop/m-p/1194580#M27375</guid>
      <dc:creator>andreypir</dc:creator>
      <dc:date>2020-07-25T03:11:11Z</dc:date>
    </item>
    <item>
      <title>Re: ippiScaleC_32s32f_C1R is slower than simple loop</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippiScaleC-32s32f-C1R-is-slower-than-simple-loop/m-p/1194583#M27376</link>
      <description>&lt;P&gt;ok, what version of IPP did you compare with?&lt;/P&gt;</description>
      <pubDate>Sat, 25 Jul 2020 04:19:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippiScaleC-32s32f-C1R-is-slower-than-simple-loop/m-p/1194583#M27376</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-07-25T04:19:55Z</dc:date>
    </item>
    <item>
      <title>Re: ippiScaleC_32s32f_C1R is slower than simple loop</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippiScaleC-32s32f-C1R-is-slower-than-simple-loop/m-p/1194594#M27377</link>
      <description>&lt;P&gt;2020.2.254&lt;/P&gt;</description>
      <pubDate>Sat, 25 Jul 2020 06:00:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippiScaleC-32s32f-C1R-is-slower-than-simple-loop/m-p/1194594#M27377</guid>
      <dc:creator>andreypir</dc:creator>
      <dc:date>2020-07-25T06:00:48Z</dc:date>
    </item>
    <item>
      <title>Re: ippiScaleC_32s32f_C1R is slower than simple loop</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippiScaleC-32s32f-C1R-is-slower-than-simple-loop/m-p/1194595#M27378</link>
      <description>&lt;P&gt;ippi.dll shows File version 2020.0.2.1083&lt;/P&gt;</description>
      <pubDate>Sat, 25 Jul 2020 06:02:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippiScaleC-32s32f-C1R-is-slower-than-simple-loop/m-p/1194595#M27378</guid>
      <dc:creator>andreypir</dc:creator>
      <dc:date>2020-07-25T06:02:47Z</dc:date>
    </item>
    <item>
      <title>Re: ippiScaleC_32s32f_C1R is slower than simple loop</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippiScaleC-32s32f-C1R-is-slower-than-simple-loop/m-p/1194848#M27379</link>
      <description>&lt;P&gt;Yes, it seems there is a problem on the IPP side and this function has to be more optimized. We will escalate the case.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 27 Jul 2020 03:06:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippiScaleC-32s32f-C1R-is-slower-than-simple-loop/m-p/1194848#M27379</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-07-27T03:06:53Z</dc:date>
    </item>
    <item>
      <title>Re: ippiScaleC_32s32f_C1R is slower than simple loop</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippiScaleC-32s32f-C1R-is-slower-than-simple-loop/m-p/1195128#M27383</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hi Andreypir!&lt;/P&gt;
&lt;P&gt;The IPP works better with rectangular ROIs when loads whole SIMD register.&lt;/P&gt;
&lt;P&gt;But could you please replace in your code this&lt;/P&gt;
&lt;LI-CODE lang="cpp"&gt;for (int n = 0; n &amp;lt; NTESTS_SCALE; n++)
{
  ScaleWithIPP(Source, nColumns, Dest, nRows, Factor, Shift);
}&lt;/LI-CODE&gt;
&lt;P&gt;with the next code? I see some speedup at my 64bit Skylake system.&lt;/P&gt;
&lt;LI-CODE lang="cpp"&gt;for (int n = 0; n &amp;lt; NTESTS_SCALE; n++)
{
  int dLen = 0;
  int phase = 0;
  ippsSampleDown_32f((Ipp32f*)Source, nColumns*nRows, Dest, &amp;amp;dLen, nColumns, &amp;amp;phase);
  IppiSize roiSize = { nRows, 1 };
  ippiScaleC_32s32f_C1R((Ipp32s*)Dest, nRows * sizeof(__int32), Factor, Shift, Dest, sizeof(float), roiSize, ippAlgHintFast);
}&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;Thanks.&lt;/P&gt;</description>
      <pubDate>Mon, 27 Jul 2020 22:39:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippiScaleC-32s32f-C1R-is-slower-than-simple-loop/m-p/1195128#M27383</guid>
      <dc:creator>Andrey_B_Intel</dc:creator>
      <dc:date>2020-07-27T22:39:32Z</dc:date>
    </item>
    <item>
      <title>Re: ippiScaleC_32s32f_C1R is slower than simple loop</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippiScaleC-32s32f-C1R-is-slower-than-simple-loop/m-p/1195133#M27384</link>
      <description>&lt;P&gt;Hi Andrey,&lt;/P&gt;
&lt;P&gt;Thank you. Yes, downsampling then scaling is substantially faster than just scaling, and somewhat faster than a loop:&lt;/P&gt;
&lt;P&gt;ScaleWithLoop: 828&lt;BR /&gt;ScaleWithIPP 625&lt;/P&gt;
&lt;P&gt;on my computer. I think I hoped for a better gain, but this will work.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 27 Jul 2020 23:09:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippiScaleC-32s32f-C1R-is-slower-than-simple-loop/m-p/1195133#M27384</guid>
      <dc:creator>andreypir</dc:creator>
      <dc:date>2020-07-27T23:09:02Z</dc:date>
    </item>
  </channel>
</rss>

