<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic We used the IPP2019 Update 2. in Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155707#M26429</link>
    <description>&lt;P&gt;We used the IPP2019 Update 2. My colleagues said the was no rpm for the threaded version. In Windows was the option avaiable.&lt;/P&gt;</description>
    <pubDate>Mon, 25 Feb 2019 08:34:14 GMT</pubDate>
    <dc:creator>Herbert_K_</dc:creator>
    <dc:date>2019-02-25T08:34:14Z</dc:date>
    <item>
      <title>CrossCorrNorm performance issues Ipp17 and Ipp9</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155698#M26420</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;i have the following performance issues with CrossCorrNorm function.&lt;/P&gt;&lt;P&gt;The first issue is with different sizes of the src and template compared to the "old" version of the algorithm.&lt;/P&gt;&lt;P&gt;If i use as source size 512, 512 and as template 256,256 i have a calculation time of 3 ms with the new version, old version 3.8 ms&lt;/P&gt;&lt;P&gt;If i use as source size 512, 512 and as template 500,500&amp;nbsp;i have a calculation time of 13 ms with the new version, old version 3.3 ms.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Where this is coming from?&lt;/P&gt;&lt;P&gt;The second issue i have with both version of the algorithm&lt;/P&gt;&lt;P&gt;If i use as source size 512, 512 and as template 256,256 i have a calculation time of 3 ms with the new version, old version 3.8 ms&lt;/P&gt;&lt;P&gt;If i use as source size 513,&amp;nbsp;512 and as template 256,256 i have a calculation time of 5.5 ms with the new version, old version 6,55ms&lt;/P&gt;&lt;P&gt;The calculation time is nearly doubled. What is happening here? This happening also at 256 to 257 and 1024 to 1025. Always doubles the calculation time.&lt;/P&gt;&lt;P&gt;Now the source code how i tested this behavior.&lt;/P&gt;&lt;P&gt;void testCrossCorr(){&lt;/P&gt;&lt;P&gt;Timer timer;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;IppStatus status;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;IppiSize srcRoiSize = { 513,512 };&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;IppiSize tplRoiSize = { 500,500 };&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;IppiSize dstRoiSize = { srcRoiSize.width - tplRoiSize.width + 1, srcRoiSize.height - tplRoiSize.height + 1 };&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;int stepBytesSrc = 0;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;Ipp8u* pSrc = ippiMalloc_8u_C1(srcRoiSize.width, srcRoiSize.height, &amp;amp;stepBytesSrc);&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;int stepBytesTpl = 0;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;Ipp8u* pTpl = ippiMalloc_8u_C1(tplRoiSize.width, tplRoiSize.height, &amp;amp;stepBytesTpl);&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;int stepBytesDst = 0;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;Ipp32f* pDst = ippiMalloc_32f_C1(dstRoiSize.width, dstRoiSize.height, &amp;amp;stepBytesDst);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;IppEnum funCfg = (IppEnum)(ippAlgAuto | ippiROIValid | ippiNorm);&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;Ipp8u *pBuffer;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;int bufSize;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;status = ippiCrossCorrNormGetBufferSize(srcRoiSize, tplRoiSize, funCfg, &amp;amp;bufSize);&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;if (status != ippStsNoErr) return;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;pBuffer = ippsMalloc_8u(bufSize);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;timer.start();&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;int loopSize = 10;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;for (int i = 0; i &amp;lt; loopSize; i++)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;{&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;status = ippiCrossCorrNorm_8u32f_C1R(pSrc, stepBytesSrc, srcRoiSize, pTpl, stepBytesTpl, tplRoiSize, pDst,&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;stepBytesDst, funCfg, pBuffer);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;}&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;{&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;timer.stop();&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;std::cout &amp;lt;&amp;lt; "\n::testTemplateMatch() ippiCrossCorrNorm_8u32f_C1R lasted " &amp;lt;&amp;lt;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; timer.elapsed_time()&amp;nbsp;&amp;lt;&amp;lt; "ms\n" &amp;lt;&amp;lt;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;timer.start();&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;}&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;for (int i = 0; i &amp;lt; loopSize; i++)&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;{&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;status = ippiCrossCorrValid_NormLevel_8u32f_C1R(pSrc, stepBytesSrc, srcRoiSize,&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;pTpl, stepBytesTpl, tplRoiSize, pDst, stepBytesDst);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;}&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;{&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;timer.stop();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;std::cout &amp;lt;&amp;lt; "\n::testTemplateMatch() ippiCrossCorrValid_NormLevel_8u32f_C1R lasted " &amp;lt;&amp;lt;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; timer.elapsed_time()&amp;nbsp;&amp;lt;&amp;lt; "ms\n" &amp;lt;&amp;lt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;/P&gt;&lt;P&gt;timer.start();&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;}&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;ippiFree(pSrc);&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;ippiFree(pTpl);&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;ippiFree(pDst);&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;ippsFree(pBuffer);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;}&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Herb&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Feb 2019 07:45:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155698#M26420</guid>
      <dc:creator>Herbert_K_</dc:creator>
      <dc:date>2019-02-20T07:45:54Z</dc:date>
    </item>
    <item>
      <title>Hi Herbert,</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155699#M26421</link>
      <description>&lt;P&gt;Hi Herbert,&lt;/P&gt;&lt;P&gt;This functionality is optimized with convolution theorem - therefore if some of dimensions crosses the next boundary of pow of 2 - the next order FFT is used, that, obviously, increases time ~2x.&lt;/P&gt;&lt;P&gt;As regarding perf differences between different versions of IPP - please provide an output from lib versions for both:&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; const IppLibraryVersion *lib;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; lib = ippiGetLibVersion();&lt;BR /&gt;&amp;nbsp; &amp;nbsp; printf( "CPU &amp;nbsp; &amp;nbsp; &amp;nbsp; : %s\n", lib-&amp;gt;targetCpu );&lt;BR /&gt;&amp;nbsp; &amp;nbsp; printf( "Name &amp;nbsp; &amp;nbsp; &amp;nbsp;: %s\n", lib-&amp;gt;Name );&lt;BR /&gt;&amp;nbsp; &amp;nbsp; printf( "Version &amp;nbsp; : %s\n", lib-&amp;gt;Version );&lt;BR /&gt;&amp;nbsp; &amp;nbsp; printf( "Build date: %s\n", lib-&amp;gt;BuildDate );&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;regards, Igor&lt;/P&gt;</description>
      <pubDate>Wed, 20 Feb 2019 09:23:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155699#M26421</guid>
      <dc:creator>Igor_A_Intel</dc:creator>
      <dc:date>2019-02-20T09:23:11Z</dc:date>
    </item>
    <item>
      <title>Hi Igor,</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155700#M26422</link>
      <description>&lt;P&gt;Hi Igor,&lt;/P&gt;&lt;P&gt;thanks for the fast reply. Normally i have a linear processing time depending on the source size and template size. When the calculation time is doubled on specific boundary the behavior is bad. Maybe a internal clustering would bring a better scaling result.&lt;/P&gt;&lt;P&gt;The Ipp9 library is:&lt;/P&gt;&lt;P&gt;targetCPU: I9&lt;/P&gt;&lt;P&gt;Name: ippIP AVX2 (I9 threaded) --&amp;gt; i use this with setNumThreads(1) and also not the mt version of the library&lt;/P&gt;&lt;P&gt;Version: 9.0 Legacy (r48491) (-)&lt;/P&gt;&lt;P&gt;BuildDate: Oct 13 2015&lt;/P&gt;&lt;P&gt;Version Ipp18&lt;/P&gt;&lt;P&gt;targetCPU: I9&lt;/P&gt;&lt;P&gt;Name: ippIP AVX2 (I9) --&amp;gt; i use this with setNumThreads(1) and also not the mt version of the library&lt;/P&gt;&lt;P&gt;Version: 2018.0.3&amp;nbsp;(r58644)&amp;nbsp;&lt;/P&gt;&lt;P&gt;BuildDate: Apr&amp;nbsp;7&amp;nbsp;2018&lt;/P&gt;&lt;P&gt;regards, Herb&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Feb 2019 10:35:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155700#M26422</guid>
      <dc:creator>Herbert_K_</dc:creator>
      <dc:date>2019-02-20T10:35:24Z</dc:date>
    </item>
    <item>
      <title>ok, got it, thank you.</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155701#M26423</link>
      <description>&lt;P&gt;ok, got it, thank you.&lt;/P&gt;&lt;P&gt;also please tell me what is your operating system - Windows or Linux?&lt;/P&gt;&lt;P&gt;regards, Igor&lt;/P&gt;</description>
      <pubDate>Wed, 20 Feb 2019 11:29:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155701#M26423</guid>
      <dc:creator>Igor_A_Intel</dc:creator>
      <dc:date>2019-02-20T11:29:35Z</dc:date>
    </item>
    <item>
      <title>the behavior happens on</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155702#M26424</link>
      <description>&lt;P&gt;the behavior happens on Windows and Linux. We develop on Windows and our target system is linux.&lt;/P&gt;</description>
      <pubDate>Wed, 20 Feb 2019 11:31:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155702#M26424</guid>
      <dc:creator>Herbert_K_</dc:creator>
      <dc:date>2019-02-20T11:31:28Z</dc:date>
    </item>
    <item>
      <title>Hi Herb,</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155703#M26425</link>
      <description>&lt;P&gt;Hi Herb,&lt;/P&gt;&lt;P&gt;I don't see any issues: (the same versions of libraries, measured on my T470 laptop (the same l9 code version):&lt;/P&gt;&lt;P&gt;static linking:&lt;/P&gt;&lt;P&gt;::testTemplateMatch() ippiCrossCorrNorm_8u32f_C1R lasted 227858.505917 cpe&lt;/P&gt;&lt;P&gt;::testTemplateMatch() ippiCrossCorrValid_NormLevel_8u32f_C1R lasted 304642.631953 cpe&lt;BR /&gt;Press any key to continue . . .&lt;/P&gt;&lt;P&gt;dynamic linking:&lt;/P&gt;&lt;P&gt;::testTemplateMatch() ippiCrossCorrNorm_8u32f_C1R lasted 232007.209467 cpe&lt;/P&gt;&lt;P&gt;::testTemplateMatch() ippiCrossCorrValid_NormLevel_8u32f_C1R lasted 276802.594083 cpe&lt;BR /&gt;Press any key to continue . . .&lt;/P&gt;&lt;P&gt;threaded dynamic libs, numThreads = 1:&lt;/P&gt;&lt;P&gt;::testTemplateMatch() ippiCrossCorrNorm_8u32f_C1R lasted 51539.084615 cpe&lt;/P&gt;&lt;P&gt;::testTemplateMatch() ippiCrossCorrValid_NormLevel_8u32f_C1R lasted 58517.840237 cpe&lt;BR /&gt;Press any key to continue . . .&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I slightly modified your code - see below:&lt;/P&gt;&lt;P&gt;#include &amp;lt;stdio.h&amp;gt;&lt;BR /&gt;#include "ippdefs.h"&lt;BR /&gt;#include "ipp.h"&lt;BR /&gt;#include "ippdefs90legacy.h"&lt;BR /&gt;#include "ippi90legacy.h"&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;int main(void)&lt;BR /&gt;{&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; IppStatus status;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; IppiSize srcRoiSize = { 512,512 };&lt;BR /&gt;&amp;nbsp; &amp;nbsp; IppiSize tplRoiSize = { 500,500 };&lt;BR /&gt;&amp;nbsp; &amp;nbsp; IppiSize dstRoiSize = { srcRoiSize.width - tplRoiSize.width + 1, srcRoiSize.height - tplRoiSize.height + 1 };&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; int stepBytesSrc = 0;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Ipp8u* pSrc = ippiMalloc_8u_C1(srcRoiSize.width, srcRoiSize.height, &amp;amp;stepBytesSrc);&lt;BR /&gt;&amp;nbsp; &amp;nbsp; int stepBytesTpl = 0;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Ipp8u* pTpl = ippiMalloc_8u_C1(tplRoiSize.width, tplRoiSize.height, &amp;amp;stepBytesTpl);&lt;BR /&gt;&amp;nbsp; &amp;nbsp; int stepBytesDst = 0;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Ipp32f* pDst = ippiMalloc_32f_C1(dstRoiSize.width, dstRoiSize.height, &amp;amp;stepBytesDst);&lt;BR /&gt;&amp;nbsp; &amp;nbsp; ippiImageJaehne_8u_C1R(pSrc, stepBytesSrc, srcRoiSize);&lt;BR /&gt;&amp;nbsp; &amp;nbsp; ippiImageJaehne_8u_C1R(pTpl, stepBytesTpl, tplRoiSize);&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; IppEnum funCfg = (IppEnum)(ippAlgAuto | ippiROIValid | ippiNorm);&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Ipp8u *pBuffer;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; int bufSize;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; status = ippiCrossCorrNormGetBufferSize(srcRoiSize, tplRoiSize, funCfg, &amp;amp;bufSize);&lt;BR /&gt;&amp;nbsp; &amp;nbsp; if (status != ippStsNoErr) return;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; pBuffer = ippsMalloc_8u(bufSize);&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; Ipp64u strt, stp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; ippSetNumThreads(1);&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; int loopSize = 10;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; status = ippiCrossCorrNorm_8u32f_C1R(pSrc, stepBytesSrc, srcRoiSize, pTpl, stepBytesTpl, tplRoiSize, pDst,&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; stepBytesDst, funCfg, pBuffer);&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;int nT;&lt;BR /&gt;&amp;nbsp; &amp;nbsp;ippGetNumThreads(&amp;amp;nT);&lt;BR /&gt;&amp;nbsp; &amp;nbsp;//printf("/nNum threads = %d\n", nT);&lt;BR /&gt;&amp;nbsp; &amp;nbsp;strt = ippGetCpuClocks();&lt;BR /&gt;&amp;nbsp; &amp;nbsp; for (int i = 0; i &amp;lt; loopSize; i++)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; {&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; status = ippiCrossCorrNorm_8u32f_C1R(pSrc, stepBytesSrc, srcRoiSize, pTpl, stepBytesTpl, tplRoiSize, pDst,&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; stepBytesDst, funCfg, pBuffer);&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; }&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; stp = ippGetCpuClocks();&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Ipp64f tmp = (Ipp64f)(stp - strt);&lt;BR /&gt;&amp;nbsp; &amp;nbsp; tmp = tmp/loopSize;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; tmp = tmp/dstRoiSize.width;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; tmp = tmp/dstRoiSize.height;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; printf( "\n::testTemplateMatch() ippiCrossCorrNorm_8u32f_C1R lasted %f cpe\n", tmp);&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; strt = ippGetCpuClocks();&lt;BR /&gt;&amp;nbsp; &amp;nbsp; for (int i = 0; i &amp;lt; loopSize; i++)&lt;BR /&gt;&amp;nbsp; &amp;nbsp; {&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; status = ippiCrossCorrValid_NormLevel_8u32f_C1R(pSrc, stepBytesSrc, srcRoiSize,&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; pTpl, stepBytesTpl, tplRoiSize, pDst, stepBytesDst);&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; }&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; stp = ippGetCpuClocks();&lt;BR /&gt;&amp;nbsp; &amp;nbsp; tmp = (Ipp64f)(stp - strt);&lt;BR /&gt;&amp;nbsp; &amp;nbsp; tmp = tmp/loopSize;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; tmp = tmp/dstRoiSize.width;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; tmp = tmp/dstRoiSize.height;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; printf( "\n::testTemplateMatch() ippiCrossCorrValid_NormLevel_8u32f_C1R lasted %f cpe\n", tmp);&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; ippiFree(pSrc);&lt;BR /&gt;&amp;nbsp; &amp;nbsp; ippiFree(pTpl);&lt;BR /&gt;&amp;nbsp; &amp;nbsp; ippiFree(pDst);&lt;BR /&gt;&amp;nbsp; &amp;nbsp; ippsFree(pBuffer);&lt;BR /&gt;}&lt;/P&gt;&lt;P&gt;regards, Igor&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Feb 2019 15:23:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155703#M26425</guid>
      <dc:creator>Igor_A_Intel</dc:creator>
      <dc:date>2019-02-20T15:23:45Z</dc:date>
    </item>
    <item>
      <title>Hi Igor,</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155704#M26426</link>
      <description>&lt;P&gt;Hi Igor,&lt;/P&gt;&lt;P&gt;thank you for the fast testing. I used your code and also used static linking. The results are:&lt;/P&gt;&lt;P&gt;::testTemplateMatch() ippiCrossCorrNorm_8u32f_C1R lasted 224359.924852 cpe&lt;BR /&gt;::testTemplateMatch() ippiCrossCorrValid_NormLevel_8u32f_C1R lasted 287181.53017 cpe&lt;/P&gt;&lt;P&gt;Also no performance issues. Now i installed also the threaded library on Ipp18 and get this result.&lt;/P&gt;&lt;P&gt;::testTemplateMatch() ippiCrossCorrNorm_8u32f_C1R lasted 59510.233136 cpe&lt;BR /&gt;::testTemplateMatch() ippiCrossCorrValid_NormLevel_8u32f_C1R lasted 50513.827219 cpe&lt;/P&gt;&lt;P&gt;also no performance problem. But i dont understand where this is coming from? We both used&amp;nbsp;ippSetNumThreads(1), therefore where we get this performance boost of time 4x? I also checked if it runs on more then 1 cpu. It does not. I have simply no explanation for this behavior. Can you help?&lt;/P&gt;&lt;P&gt;Regards, Herb&lt;/P&gt;</description>
      <pubDate>Thu, 21 Feb 2019 09:26:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155704#M26426</guid>
      <dc:creator>Herbert_K_</dc:creator>
      <dc:date>2019-02-21T09:26:39Z</dc:date>
    </item>
    <item>
      <title>Hi Igor,</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155705#M26427</link>
      <description>&lt;P&gt;Hi Igor,&lt;/P&gt;&lt;P&gt;you have an update on this topic? I looked into the list of threaded functions. The CrossCorr function is not listed. Don´t understand the&amp;nbsp; performance boost in this function depending on the library used. Also the "threaded" libary is no longer available under linux.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards, Herb&lt;/P&gt;</description>
      <pubDate>Mon, 25 Feb 2019 06:01:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155705#M26427</guid>
      <dc:creator>Herbert_K_</dc:creator>
      <dc:date>2019-02-25T06:01:15Z</dc:date>
    </item>
    <item>
      <title>Hi Herb,</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155706#M26428</link>
      <description>&lt;P&gt;Hi Herb,&lt;/P&gt;&lt;P&gt;To this moment I don't have the final update. I performed some investigations and found several interesting things in the threaded libs behavior, that I can't explain just now. Therefore this is in progress. As regarding your statement that "threaded" libs are not longer available under Linux - which IPP version do you mean? - All IPP releases have threaded libs for Linux as well as for Windows.&lt;/P&gt;&lt;P&gt;regards, Igor.&lt;/P&gt;</description>
      <pubDate>Mon, 25 Feb 2019 08:18:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155706#M26428</guid>
      <dc:creator>Igor_A_Intel</dc:creator>
      <dc:date>2019-02-25T08:18:56Z</dc:date>
    </item>
    <item>
      <title>We used the IPP2019 Update 2.</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155707#M26429</link>
      <description>&lt;P&gt;We used the IPP2019 Update 2. My colleagues said the was no rpm for the threaded version. In Windows was the option avaiable.&lt;/P&gt;</description>
      <pubDate>Mon, 25 Feb 2019 08:34:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155707#M26429</guid>
      <dc:creator>Herbert_K_</dc:creator>
      <dc:date>2019-02-25T08:34:14Z</dc:date>
    </item>
    <item>
      <title>My colleagues found the</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155708#M26430</link>
      <description>&lt;P&gt;My colleagues found the threaded version. Sorry for the inconvienence.&lt;/P&gt;</description>
      <pubDate>Mon, 25 Feb 2019 10:00:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155708#M26430</guid>
      <dc:creator>Herbert_K_</dc:creator>
      <dc:date>2019-02-25T10:00:35Z</dc:date>
    </item>
    <item>
      <title>Hi Herb,</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155709#M26431</link>
      <description>&lt;P style="margin-left:0cm; margin-right:0cm"&gt;Hi Herb,&lt;/P&gt;&lt;P style="margin-left:0cm; margin-right:0cm"&gt;The issue with 4x performance gap between sequential and threaded (launched with ippSetNumThreads(1)) versions of ippiCrossCorrNorm_8u32f_C1R was resolved. The root cause was difference in algorithm’s parameters for threaded and non-threaded modes.&lt;/P&gt;&lt;P style="margin-left:0cm; margin-right:0cm"&gt;The fix will be available in IPP 2020.&lt;/P&gt;&lt;P style="margin-left:0cm; margin-right:0cm"&gt;&amp;nbsp;&lt;/P&gt;&lt;P style="margin-left:0cm; margin-right:0cm"&gt;The fix for ippiCrossCorrValid_NormLevel_8u32f_C1R (legacy library) is still in progress.&lt;/P&gt;&lt;P style="margin-left:0cm; margin-right:0cm"&gt;BR, Artem.&lt;/P&gt;</description>
      <pubDate>Wed, 27 Mar 2019 15:48:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155709#M26431</guid>
      <dc:creator>ArtemMaklaev</dc:creator>
      <dc:date>2019-03-27T15:48:09Z</dc:date>
    </item>
    <item>
      <title>We will update this thread as</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155710#M26432</link>
      <description>&lt;P&gt;We will update this thread as soon as the fix of the problem will be available.&lt;/P&gt;</description>
      <pubDate>Fri, 29 Mar 2019 03:14:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/CrossCorrNorm-performance-issues-Ipp17-and-Ipp9/m-p/1155710#M26432</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2019-03-29T03:14:24Z</dc:date>
    </item>
  </channel>
</rss>

