<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Debug build is 2x faster in Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906275#M13490</link>
    <description>&lt;P&gt;Hello, &lt;/P&gt;
&lt;P&gt;how do you link IPP libs, dynamically or statically? If you use static linking did you call ippStaticInit function?&lt;/P&gt;
&lt;P&gt;Regards,&lt;BR /&gt; Vladimir&lt;/P&gt;</description>
    <pubDate>Thu, 12 Oct 2006 21:53:24 GMT</pubDate>
    <dc:creator>Vladimir_Dudnik</dc:creator>
    <dc:date>2006-10-12T21:53:24Z</dc:date>
    <item>
      <title>Debug build is 2x faster</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906274#M13489</link>
      <description>I have a application using IPP 5.1 to do QAM demodulation, some how&lt;BR /&gt;debug build is over 2x faster than release build, even after I disabled optimizations in release build. This happened on both a dual Xeon (2.8Ghz) and a P4 3.06GHz. I wonder what makes the difference.&lt;BR /&gt;The processing is basically: BandPass - Mix - LowPass - Magnitute - down sampling.&lt;BR /&gt;I am using Visual studio 2005.&lt;BR /&gt;</description>
      <pubDate>Thu, 12 Oct 2006 02:19:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906274#M13489</guid>
      <dc:creator>xilin</dc:creator>
      <dc:date>2006-10-12T02:19:26Z</dc:date>
    </item>
    <item>
      <title>Re: Debug build is 2x faster</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906275#M13490</link>
      <description>&lt;P&gt;Hello, &lt;/P&gt;
&lt;P&gt;how do you link IPP libs, dynamically or statically? If you use static linking did you call ippStaticInit function?&lt;/P&gt;
&lt;P&gt;Regards,&lt;BR /&gt; Vladimir&lt;/P&gt;</description>
      <pubDate>Thu, 12 Oct 2006 21:53:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906275#M13490</guid>
      <dc:creator>Vladimir_Dudnik</dc:creator>
      <dc:date>2006-10-12T21:53:24Z</dc:date>
    </item>
    <item>
      <title>Re: Debug build is 2x faster</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906276#M13491</link>
      <description>I am using dynamic linking. I will try static to see what happened. Thx.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 13 Oct 2006 01:38:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906276#M13491</guid>
      <dc:creator>linx</dc:creator>
      <dc:date>2006-10-13T01:38:27Z</dc:date>
    </item>
    <item>
      <title>Re: Debug build is 2x faster</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906277#M13492</link>
      <description>that's something strange, could you share piece of code? What is your target platform/processor?</description>
      <pubDate>Fri, 13 Oct 2006 01:50:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906277#M13492</guid>
      <dc:creator>Vladimir_Dudnik</dc:creator>
      <dc:date>2006-10-13T01:50:14Z</dc:date>
    </item>
    <item>
      <title>Re: Debug build is 2x faster</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906278#M13493</link>
      <description>Sure, this is a section of the code. This is part of ultrasound imaging, frame data are broken into vectors and convert from Ipp16s to Ipp32f, then feed into this function, after this piece of code, we send data to directx functions to display. Strange thing is I had the code with Ipp4.1/Visual Studio 2003, and didn't notice the problem. I just tried static link debug build is still faster, though only by ~10%.&lt;BR /&gt;Currently we are doing everything in FPGA, I am just looking if possible to move this&lt;BR /&gt;to SW. &lt;BR /&gt;&lt;BR /&gt;//#define SAMPLE_VECTOR 512&lt;BR /&gt;//m_nSamplesIn = 4096; nBp = nLp = 65;&lt;BR /&gt;// BPF, LPF are coefficients of filters (Ipp32f)&lt;BR /&gt;&lt;BR /&gt;void CEnv:: ProcessVector(Ipp32f* pSrc, Ipp32f* pDst)&lt;BR /&gt;{&lt;BR /&gt; // BPF -&amp;gt; m_pTemp&lt;BR /&gt; ippsConv_32f(pSrc, m_nSamplesIn, BPF, nBp, m_pTemp);&lt;BR /&gt;&lt;BR /&gt; //Mix with sin/cosine -&amp;gt; m_pQ, m_pI&lt;BR /&gt; ippsMul_32f(m_sin, m_pTemp + nBp - 1, m_pI, m_nSamplesIn);&lt;BR /&gt; ippsMul_32f(m_cos, m_pTemp + nBp - 1, m_pQ, m_nSamplesIn);&lt;BR /&gt;&lt;BR /&gt;// LPF(m_pI)-&amp;gt;m_Temp. LPF(m_pQ)-&amp;gt;m_pI&lt;BR /&gt; ippsConv_32f(m_pI, m_nSamplesIn, &amp;amp;LPF[0], nLp, m_pTemp);&lt;BR /&gt; ippsConv_32f(m_pQ, m_nSamplesIn, &amp;amp;LPF[0], nLp, m_pI);&lt;BR /&gt;&lt;BR /&gt;// decimate&lt;BR /&gt; int down = m_nSamplesIn / SAMPLE_VECTOR, phase = 0;&lt;BR /&gt; int len;&lt;BR /&gt;&lt;BR /&gt; ippsSampleDown_32f(m_pTemp+nLp -1, down * SAMPLE_VECTOR, m_pQ, &amp;amp;len, down, &amp;amp;phase);&lt;BR /&gt; ippsSampleDown_32f(m_pI+nLp-1, down * SAMPLE_VECTOR, m_pTemp, &amp;amp;len, down, &amp;amp;phase);&lt;BR /&gt;&lt;BR /&gt;// envelope&lt;BR /&gt; ippsMagnitude_32f(m_pQ, m_pTemp, pDst, SAMPLE_VECTOR);&lt;BR /&gt;}&lt;BR /&gt;</description>
      <pubDate>Fri, 13 Oct 2006 02:14:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906278#M13493</guid>
      <dc:creator>linx</dc:creator>
      <dc:date>2006-10-13T02:14:41Z</dc:date>
    </item>
    <item>
      <title>Re: Debug build is 2x faster</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906279#M13494</link>
      <description>Is your memory buffers (pSrc and pDst) aligned on 16-bytes boundary (better 32 bytes)? You know, Intel processors can access data quite efficient in case of aligned addresses. I just not see other reasons for that strange behaviour. To make sure vectors correctly aligned I recommend you allocate them with ippMalloc function (ippsMalloc_xx family functions) and free with ippFree function.</description>
      <pubDate>Fri, 13 Oct 2006 02:28:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906279#M13494</guid>
      <dc:creator>Vladimir_Dudnik</dc:creator>
      <dc:date>2006-10-13T02:28:49Z</dc:date>
    </item>
    <item>
      <title>Re: Debug build is 2x faster</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906280#M13495</link>
      <description>&lt;P&gt;Additional suggestion is to parallelize your processing. It seems rows in your case are processed independently and so two rows can be done in parallel on dual-core systems. Do you use that opportunity?&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Oct 2006 02:49:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906280#M13495</guid>
      <dc:creator>Vladimir_Dudnik</dc:creator>
      <dc:date>2006-10-13T02:49:25Z</dc:date>
    </item>
    <item>
      <title>Re: Debug build is 2x faster</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906281#M13496</link>
      <description>Buffers are aligned to page (4096). My system is single core. Tough I do have two threads each process half of a frame.&lt;BR /&gt;</description>
      <pubDate>Fri, 13 Oct 2006 04:38:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906281#M13496</guid>
      <dc:creator>linx</dc:creator>
      <dc:date>2006-10-13T04:38:12Z</dc:date>
    </item>
    <item>
      <title>Re: Debug build is 2x faster</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906282#M13497</link>
      <description>Thanks. BTW,are your resultsthe same between debug and release build and are they correct? Could you also to wrap each function call with timers, to see where you spend more time than expected?</description>
      <pubDate>Fri, 13 Oct 2006 05:00:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906282#M13497</guid>
      <dc:creator>Vladimir_Dudnik</dc:creator>
      <dc:date>2006-10-13T05:00:00Z</dc:date>
    </item>
    <item>
      <title>Re: Debug build is 2x faster</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906283#M13498</link>
      <description>Visually both look correct and similiar to images produced by FPGA or Matlab, I haven't compared every bit. I will do some profiling. Thanks.&lt;BR /&gt;</description>
      <pubDate>Sat, 14 Oct 2006 01:08:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Debug-build-is-2x-faster/m-p/906283#M13498</guid>
      <dc:creator>linx</dc:creator>
      <dc:date>2006-10-14T01:08:42Z</dc:date>
    </item>
  </channel>
</rss>

