<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: problem with GetCpuClocks in Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/problem-with-GetCpuClocks/m-p/909940#M14028</link>
    <description>&lt;DIV style="margin: 0px; height: auto;"&gt;&lt;/DIV&gt;
If you use optimized compiler mode, the compiler may optimize away all the code in your C-loop. So, it will contain two successive calls to ippGetCpuClocks only. Meanwhile, ippsCopy honestly copies all stuff between src and dst.&lt;BR /&gt;Try your sample with "-Od" compiler option, i.e. without optimization.&lt;BR /&gt;&lt;BR /&gt;P.S. this behaviour is usual for optimizing compilers. If they see that some variable is not used down the code, compiler doesn't even process that variable.&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Sergey&lt;BR /&gt;</description>
    <pubDate>Fri, 16 Oct 2009 12:16:06 GMT</pubDate>
    <dc:creator>Sergey_K_Intel</dc:creator>
    <dc:date>2009-10-16T12:16:06Z</dc:date>
    <item>
      <title>problem with GetCpuClocks</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/problem-with-GetCpuClocks/m-p/909939#M14027</link>
      <description>#include &lt;IPP.H&gt;&lt;BR /&gt;#include &lt;STDIO.H&gt;&lt;BR /&gt;&lt;BR /&gt;void main()&lt;BR /&gt;{&lt;BR /&gt;const int SIZE=256;&lt;BR /&gt;Ipp8u pSrc[SIZE],pDst[SIZE];&lt;BR /&gt;Ipp64u begin,end;&lt;BR /&gt;&lt;BR /&gt;int i;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;for(i=0;i&lt;SIZE&gt;&lt;/SIZE&gt;pSrc&lt;I&gt;=(Ipp8u)i;&lt;BR /&gt;&lt;BR /&gt;begin=ippGetCpuClocks();&lt;BR /&gt;for(i=0;i&lt;SIZE&gt;&lt;/SIZE&gt;pDst&lt;I&gt;=pSrc&lt;I&gt;;&lt;BR /&gt;end=ippGetCpuClocks();&lt;BR /&gt;printf("time taken in c=%ld",(end-begin));&lt;BR /&gt;&lt;BR /&gt;begin=ippGetCpuClocks();&lt;BR /&gt;ippsCopy_8u(pSrc,pDst,SIZE);&lt;BR /&gt;end=ippGetCpuClocks();&lt;BR /&gt;printf("time taken in ipp=%ld",(end-begin));&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;i am surprised to see that time taken in ipp is 6 times larger than in c. Is thr anything wrong with the code? &lt;BR /&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/STDIO.H&gt;&lt;/IPP.H&gt;</description>
      <pubDate>Fri, 16 Oct 2009 10:14:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/problem-with-GetCpuClocks/m-p/909939#M14027</guid>
      <dc:creator>coolsandyforyou</dc:creator>
      <dc:date>2009-10-16T10:14:19Z</dc:date>
    </item>
    <item>
      <title>Re: problem with GetCpuClocks</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/problem-with-GetCpuClocks/m-p/909940#M14028</link>
      <description>&lt;DIV style="margin: 0px; height: auto;"&gt;&lt;/DIV&gt;
If you use optimized compiler mode, the compiler may optimize away all the code in your C-loop. So, it will contain two successive calls to ippGetCpuClocks only. Meanwhile, ippsCopy honestly copies all stuff between src and dst.&lt;BR /&gt;Try your sample with "-Od" compiler option, i.e. without optimization.&lt;BR /&gt;&lt;BR /&gt;P.S. this behaviour is usual for optimizing compilers. If they see that some variable is not used down the code, compiler doesn't even process that variable.&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Sergey&lt;BR /&gt;</description>
      <pubDate>Fri, 16 Oct 2009 12:16:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/problem-with-GetCpuClocks/m-p/909940#M14028</guid>
      <dc:creator>Sergey_K_Intel</dc:creator>
      <dc:date>2009-10-16T12:16:06Z</dc:date>
    </item>
    <item>
      <title>Re: problem with GetCpuClocks</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/problem-with-GetCpuClocks/m-p/909941#M14029</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/438848"&gt;Sergey Khlystov (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; If you use optimized compiler mode, the compiler may optimize away all the code in your C-loop. So, it will contain two successive calls to ippGetCpuClocks only. Meanwhile, ippsCopy honestly copies all stuff between src and dst.&lt;BR /&gt;Try your sample with "-Od" compiler option, i.e. without optimization.&lt;BR /&gt;&lt;BR /&gt;P.S. this behaviour is usual for optimizing compilers. If they see that some variable is not used down the code, compiler doesn't even process that variable.&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Sergey&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
i kept no optimization still got the same thing,but when itried for other functions like SAD() it worked fine...</description>
      <pubDate>Mon, 19 Oct 2009 14:19:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/problem-with-GetCpuClocks/m-p/909941#M14029</guid>
      <dc:creator>coolsandyforyou</dc:creator>
      <dc:date>2009-10-19T14:19:10Z</dc:date>
    </item>
    <item>
      <title>Re: problem with GetCpuClocks</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/problem-with-GetCpuClocks/m-p/909942#M14030</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/439746"&gt;coolsandyforyou&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; i kept no optimization still got the same thing,but when itried for other functions like SAD() it worked fine...&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Hi,&lt;BR /&gt;Really, the performance numbers are as you were describing. It looks like the problem is in "cold" instruction cache. Our guys say, that with &lt;STRONG&gt;rare &lt;/STRONG&gt;IPP function calls and with &lt;STRONG&gt;short data&lt;/STRONG&gt; the pure C/C++ loops are faster than IPP function calls. But, try the following modification of your test (&lt;STRONG&gt;bold lines were added&lt;/STRONG&gt;) and you'll see different performance data&lt;BR /&gt;&lt;SPAN style="font-size: small;"&gt;&lt;BR /&gt;#include &lt;IPP.H&gt;&lt;BR /&gt;#include &lt;STDIO.H&gt;&lt;BR /&gt;void main()&lt;BR /&gt;{&lt;BR /&gt; const int SIZE=256;&lt;BR /&gt; Ipp8u pSrc[SIZE],pDst[SIZE];&lt;BR /&gt; Ipp64u begin,end;&lt;BR /&gt; &lt;STRONG&gt;Ipp8u pSrc1[SIZE], pDst1[SIZE];  // dumb arrays&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt; int i;&lt;BR /&gt;&lt;BR /&gt; for(i=0;i&lt;SIZE&gt;&lt;/SIZE&gt; pSrc&lt;I&gt;=(Ipp8u)i;&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt; ippsCopy_8u(pSrc1,pDst1,SIZE); // instruction cache warming&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt; begin=ippGetCpuClocks();&lt;BR /&gt; for(i=0;i&lt;SIZE&gt;&lt;/SIZE&gt; pDst&lt;I&gt;=pSrc&lt;I&gt;;&lt;BR /&gt; end=ippGetCpuClocks();&lt;BR /&gt; printf("time taken in c=%ldn",(end-begin));&lt;BR /&gt;&lt;BR /&gt; begin=ippGetCpuClocks();&lt;BR /&gt; ippsCopy_8u(pSrc,pDst,SIZE);&lt;BR /&gt; end=ippGetCpuClocks();&lt;BR /&gt; printf("time taken in ipp=%ldn",(end-begin));&lt;BR /&gt;}&lt;BR /&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/STDIO.H&gt;&lt;/IPP.H&gt;&lt;/SPAN&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Sergey&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 21 Oct 2009 06:37:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/problem-with-GetCpuClocks/m-p/909942#M14030</guid>
      <dc:creator>Sergey_K_Intel</dc:creator>
      <dc:date>2009-10-21T06:37:02Z</dc:date>
    </item>
  </channel>
</rss>

