<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re:Strange Behavior of ippi Integral in Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1352523#M27900</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for all the information that you have provided.&lt;/P&gt;&lt;P&gt;We are working on your issue, we will get back to you soon.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Vidya.&lt;/P&gt;&lt;BR /&gt;</description>
    <pubDate>Wed, 19 Jan 2022 05:44:31 GMT</pubDate>
    <dc:creator>VidyalathaB_Intel</dc:creator>
    <dc:date>2022-01-19T05:44:31Z</dc:date>
    <item>
      <title>Strange Behavior of ippi Integral</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1349306#M27888</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;I was trying ippiIntegral_8u32s_C1R , it works fine , but&lt;SPAN&gt;&amp;nbsp;when i tried to set the Integral array to zero before doing Integral which should just cost longer time than without set array to zero , it makes the integral function a lot faster. If I set the array to zero with tbb, then the cost time of parallel_for set array to zero plus integral will be about half the time which integral without set array to zero first cost. Since my work is performace focus , I am interest in what happens. Does ippiIntegral set output array to zero first? Or any other reason? (Edit: the ippi version is 2021.5.0)&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 07 Jan 2022 06:57:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1349306#M27888</guid>
      <dc:creator>Emanon</dc:creator>
      <dc:date>2022-01-07T06:57:58Z</dc:date>
    </item>
    <item>
      <title>Re:Strange Behavior of ippi Integral</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1349952#M27891</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for reaching out to us.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;gt;&amp;gt;i&lt;I&gt; tried to set the Integral array to zero before doing Integral......If I set the array to zero with tbb, then the cost time of parallel_for set array to zero plus integral will be about half the time&amp;nbsp;&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Could you please share with us your OS details and a minimal reproducer(&amp;amp; steps to reproduce if any) for scenarios which you were mentioning &amp;amp; also the timings that you are getting so that we can work on it from our end?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Vidya.&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 10 Jan 2022 09:03:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1349952#M27891</guid>
      <dc:creator>VidyalathaB_Intel</dc:creator>
      <dc:date>2022-01-10T09:03:26Z</dc:date>
    </item>
    <item>
      <title>回應： Re:Strange Behavior of ippi Integral</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1350262#M27894</link>
      <description>&lt;P&gt;my OS is win10 pro ,CPU Interl i7-8750H , 16GB RAM.&lt;/P&gt;
&lt;P&gt;Image size is 16384*24000&lt;/P&gt;
&lt;P&gt;Cost Time is about 520000 microseconds without memset, and memset plus Integral cost about 260000 microseconds.&lt;/P&gt;
&lt;P&gt;Thanks.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="cpp"&gt;INT32 Insp_Run(BYTE* SrcBuf,int SrcW,int SrcH,int RectX,int RectY,int RectW,int RectH)
{
    int dRes=0;
    auto stamp01 = std::chrono::high_resolution_clock::now();
    Ipp32s* ImageIntegrals = new Ipp32s[(SrcW+1)*(SrcH+1)];
    IppiSize roiSize = {RectW,RectH};
    int srcStep = SrcW;
    int dstStep = SrcW+1;
    //integral become faster,just memset zero with tbb
    {
        _ImageIntegral_0 ImageIntegral_0;
        ImageIntegral_0.ImageIntegrals = ImageIntegrals;
        parallel_for(tbb:blocked_range&amp;lt;INT32&amp;gt;(0,SrcH+1),ImageIntegral_0);
    }
    dRes = ippiIntegral_8u32s_C1R(SrcBuf+RectY*SrcW+RectX,srcStep,ImageIntegrals+RectY*(SrcW+1)+RectX,dstStep*sizeof(Ipp32s),roiSize,0);
    auto stamp02 = std::chrono::high_resolution_clock::now();
    
    delete []ImageIntegrals;
    auto duration = std::chrono::duration_cast&amp;lt;std::chrono::microseconds&amp;gt;(stamp02-stamp01);
    std::string strdur = std::to_string(duration.count());
    MessageBoxA(NULL,strdur.c_str(),"",MB_OK);
}&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Jan 2022 01:51:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1350262#M27894</guid>
      <dc:creator>Emanon</dc:creator>
      <dc:date>2022-01-11T01:51:08Z</dc:date>
    </item>
    <item>
      <title>Re: Strange Behavior of ippi Integral</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1350668#M27897</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for getting back to us.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It would be a great help if you share the complete sample reproducer so that it helps us to get more insights regarding this issue.&lt;/P&gt;
&lt;P&gt;So, could you please share with us the complete working sample reproducer along with the command you have used to compile the code?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;Vidya.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Jan 2022 05:28:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1350668#M27897</guid>
      <dc:creator>VidyalathaB_Intel</dc:creator>
      <dc:date>2022-01-12T05:28:21Z</dc:date>
    </item>
    <item>
      <title>Re: Strange Behavior of ippi Integral</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1350680#M27898</link>
      <description>&lt;P&gt;I think the code sample I posted should work , sorry I can not understand what do you mean complete sample reproducer.&lt;/P&gt;
&lt;P&gt;If you mean you need the code which is without memset just need to&amp;nbsp;&lt;SPAN&gt;comment out from _ImageIntegral_0 to the tbb parallel_for.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Its just doing only memset parallel for every row so I think its not a big deal, I have tried without doing parallel and just memset the array , the runtime of&amp;nbsp;ippiIntegral_8u32s_C1R will still be reduced , just the memset time cost longer without parallel.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Edit: sorry I missed to return, add "return dRes;" at the last line then it should work now.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Jan 2022 01:14:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1350680#M27898</guid>
      <dc:creator>Emanon</dc:creator>
      <dc:date>2022-01-13T01:14:00Z</dc:date>
    </item>
    <item>
      <title>Re:Strange Behavior of ippi Integral</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1352523#M27900</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for all the information that you have provided.&lt;/P&gt;&lt;P&gt;We are working on your issue, we will get back to you soon.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Vidya.&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 19 Jan 2022 05:44:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1352523#M27900</guid>
      <dc:creator>VidyalathaB_Intel</dc:creator>
      <dc:date>2022-01-19T05:44:31Z</dc:date>
    </item>
    <item>
      <title>Re: Strange Behavior of ippi Integral</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1355160#M27906</link>
      <description>&lt;P&gt;Hi Emanon.&lt;/P&gt;
&lt;P&gt;Could you please run "tbb zero-ing"&amp;nbsp; in single thread and provide perf results please?&lt;/P&gt;
&lt;P&gt;Andrey B.&lt;/P&gt;
&lt;P&gt;IPP&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jan 2022 14:13:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1355160#M27906</guid>
      <dc:creator>Andrey_B_Intel</dc:creator>
      <dc:date>2022-01-27T14:13:52Z</dc:date>
    </item>
    <item>
      <title>回應： Strange Behavior of ippi Integral</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1355325#M27908</link>
      <description>&lt;P&gt;Hi Andrey.&lt;/P&gt;
&lt;P&gt;I tried zeroing in single thread and the cost time of ippiIntegral is almost the same as zeroing with tbb,&lt;/P&gt;
&lt;P&gt;which is about 150000&amp;nbsp;&lt;SPAN&gt;microseconds(only ippiIntegral, not include zeroing).&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jan 2022 01:09:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1355325#M27908</guid>
      <dc:creator>Emanon</dc:creator>
      <dc:date>2022-01-28T01:09:20Z</dc:date>
    </item>
    <item>
      <title>Re: Strange Behavior of ippi Integral</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1357094#M27913</link>
      <description>&lt;P&gt;Hi Emanon.&lt;/P&gt;
&lt;P&gt;To estimate performance of IPP functions the next template is recommended:&lt;/P&gt;
&lt;LI-CODE lang="cpp"&gt;ipp_func()
t0 = get_timer();
for (n=0;n&amp;lt;N;n++)
  ipp_func();
t1 = get_timer();
func_time=(t1-t0)/N&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;first call of ipp function is skipped because a lot of events happen:&amp;nbsp; physical memory allocation, data loading to cache from memory, branch predictor statistic update, frequency scaling and so on. In next calls CPU and data are in "ready" state and performance differs.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Could you please redesign your reproducer according with this approach?&lt;/P&gt;
&lt;P&gt;Thanks.&lt;/P&gt;
&lt;P&gt;Andrey B&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 03 Feb 2022 15:40:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1357094#M27913</guid>
      <dc:creator>Andrey_B_Intel</dc:creator>
      <dc:date>2022-02-03T15:40:11Z</dc:date>
    </item>
    <item>
      <title>回應： Strange Behavior of ippi Integral</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1357916#M27914</link>
      <description>&lt;P&gt;Hi Andrey.&lt;/P&gt;
&lt;P&gt;Does ippInit() count as a ipp function?&lt;/P&gt;
&lt;P&gt;If do then with ippInit() first, the run time without set array to zero of ippiIntegral is 520815.2(ms),&lt;/P&gt;
&lt;P&gt;and the run time with set array to zero of ippiIntegral is 146848.9(ms). (Image size is 16384x24000)&lt;/P&gt;</description>
      <pubDate>Mon, 07 Feb 2022 01:40:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1357916#M27914</guid>
      <dc:creator>Emanon</dc:creator>
      <dc:date>2022-02-07T01:40:17Z</dc:date>
    </item>
    <item>
      <title>Re: Strange Behavior of ippi Integral</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1371339#M27954</link>
      <description>&lt;P&gt;Hi Emanon.&lt;/P&gt;
&lt;P&gt;I am attaching small benchmark how measure performance of ippIntegral (and other functions).&lt;/P&gt;
&lt;P&gt;My system is Xeon Silver 4116. "cpe" means clock per element. Less is better.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1024 x 1024 start_val=0, num_loops=1, cpe= 0.8743&lt;BR /&gt;1024 x 1024 start_val=1, num_loops=1, cpe= 0.8735&lt;BR /&gt;1024 x 1024 start_val=0, num_loops=10, cpe= 0.6566&lt;BR /&gt;1024 x 1024 start_val=1, num_loops=10, cpe= 0.6469&lt;BR /&gt;1024 x 1024 start_val=0, num_loops=1000, cpe= 0.6024&lt;BR /&gt;1024 x 1024 start_val=1, num_loops=1000, cpe= 0.6179&lt;/P&gt;
&lt;P&gt;cpe for 1000 runs is 0.61, for 1 run is 0.87 and we usually orientate at performance of multiple runs.&lt;/P&gt;
&lt;P&gt;Could you please run this code at your system and provide results?&lt;/P&gt;
&lt;P&gt;Thanks.&lt;/P&gt;
&lt;P&gt;Andrey&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 23 Mar 2022 13:05:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Strange-Behavior-of-ippi-Integral/m-p/1371339#M27954</guid>
      <dc:creator>Andrey_B_Intel</dc:creator>
      <dc:date>2022-03-23T13:05:12Z</dc:date>
    </item>
  </channel>
</rss>

