<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Mikhail, in Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945052#M18050</link>
    <description>&lt;P&gt;Mikhail,&lt;/P&gt;
&lt;P&gt;I am currently investigating this. The performance profile data is quite strange like this (not sure that the table will look good):&lt;/P&gt;
&lt;P&gt;Top Hotspots&lt;BR /&gt;Function CPU &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Time&lt;BR /&gt;DMIP::Trace::Space &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;33.166s&lt;BR /&gt;own_ipps_sExp_G9LAynn &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;5.493s&lt;BR /&gt;g9_innerRGBToGray_8u_C3C1R &amp;nbsp; &amp;nbsp;3.551s&lt;BR /&gt;[dmip-1.5.dll] &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2.947s&lt;BR /&gt;ippGetCpuFreqMhz &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2.325s&lt;BR /&gt;[Others] &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 11.318s&lt;/P&gt;
&lt;P&gt;I will check what's going wrong.&lt;/P&gt;
&lt;P&gt;Regards,&lt;BR /&gt;Sergey&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 16 May 2013 15:27:51 GMT</pubDate>
    <dc:creator>Sergey_K_Intel</dc:creator>
    <dc:date>2013-05-16T15:27:51Z</dc:date>
    <item>
      <title>Why is IPP DMIP slower than Concurrency::parallel_for?</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945045#M18043</link>
      <description>&lt;P&gt;I've downloaded Intel IPP DMIP sample: ipp-samples.7.1.1.013. I built application\dmip_bench\ utility against IPP v7.1.1. It showed significant performance boost of DMIP flavor against IPP flavor.&lt;/P&gt;
&lt;P&gt;I then refactored ModifyBrightness::DoIPP method to simply process image by rows, and parallelized this processing with Concurrency::parallel_for. Then I rebuild the solution with both _IPP_SEQUENTIAL_STATIC and _IPP_PARALLEL_DYNAMIC macros. And the results was unexpected.&lt;/P&gt;
&lt;P&gt;With _IPP_SEQUENTIAL_STATIC:&lt;/P&gt;
&lt;P&gt;DMIP 1.5 Jul 12 2012&lt;BR /&gt;ippIP SSSE3 (v8) 7.1.1 (r37466) Sep 24 2012&lt;BR /&gt;ippCV SSSE3 (v8) 7.1.1 (r37466) Sep 24 2012&lt;BR /&gt;ippCC SSSE3 (v8) 7.1.1 (r37466) Sep 25 2012&lt;BR /&gt;Number of threads: 2&lt;BR /&gt;DMIP Modify Brightness example time 3.16375 msec slice 34&lt;BR /&gt;IPP Modify Brightness example time 1.85974 msec slice 467&lt;BR /&gt;Close the session&lt;/P&gt;
&lt;P&gt;With _IPP_PARALLEL_DYNAMIC:&lt;/P&gt;
&lt;P&gt;DMIP 1.5 Jul 12 2012&lt;BR /&gt;ippIP SSSE3 (v8) 7.1.1 (r37466) Sep 27 2012&lt;BR /&gt;ippCV SSSE3 (v8) 7.1.1 (r37466) Sep 27 2012&lt;BR /&gt;ippCC SSSE3 (v8) 7.1.1 (r37466) Sep 28 2012&lt;BR /&gt;Number of threads: 2&lt;BR /&gt;DMIP Modify Brightness example time 2.34378 msec slice 34&lt;BR /&gt;IPP Modify Brightness example time 6.75662 msec slice 467&lt;BR /&gt;Close the session&lt;/P&gt;
&lt;P&gt;As you can see, manually parallelized version works better, than DMIP. Why?&lt;/P&gt;
&lt;P&gt;I used Visual Studio 2010 for compilation. Under Windows 7 x64. Solution configuration was x86. I have Intel E6550 processor. I used an RGB 1200x467 image.&lt;/P&gt;
&lt;P&gt;I attached modified sample. With compiled executables and output logs.&lt;/P&gt;</description>
      <pubDate>Tue, 30 Apr 2013 11:42:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945045#M18043</guid>
      <dc:creator>Mikhail_Matrosov</dc:creator>
      <dc:date>2013-04-30T11:42:23Z</dc:date>
    </item>
    <item>
      <title>Still waiting for response.</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945046#M18044</link>
      <description>&lt;P&gt;Still waiting for response.&lt;/P&gt;</description>
      <pubDate>Thu, 16 May 2013 06:29:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945046#M18044</guid>
      <dc:creator>Mikhail_Matrosov</dc:creator>
      <dc:date>2013-05-16T06:29:03Z</dc:date>
    </item>
    <item>
      <title>Hi Mikhail,</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945047#M18045</link>
      <description>&lt;P&gt;Hi Mikhail,&lt;/P&gt;
&lt;P&gt;Sorry for late response. Could you return back to unmodified version of dmip_bench and compare sequential_static vs. parallel_dynamic results?&lt;/P&gt;
&lt;P&gt;I have a suspicion that linking DMIP-based application to threaded libraries only harms to overall performance due to thread oversubscription. Or, somewhere in main() you should set ippSetNumThreads(1). DMIP itself already uses all available CPU cores and if application will split execution further adding new threads, nothing good may happen.&lt;/P&gt;
&lt;P&gt;Regards,&lt;BR /&gt;Sergey&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 16 May 2013 06:57:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945047#M18045</guid>
      <dc:creator>Sergey_K_Intel</dc:creator>
      <dc:date>2013-05-16T06:57:20Z</dc:date>
    </item>
    <item>
      <title>Sergey,</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945048#M18046</link>
      <description>&lt;P&gt;Sergey,&lt;/P&gt;
&lt;P&gt;I believe there is no need in such a test. What important is that my own naive parallelization for single-threaded libraries works faster than DMIP linked against both single- and multi-threaded libraties. Could you run provided code on your machine and check it?&lt;/P&gt;</description>
      <pubDate>Thu, 16 May 2013 09:28:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945048#M18046</guid>
      <dc:creator>Mikhail_Matrosov</dc:creator>
      <dc:date>2013-05-16T09:28:12Z</dc:date>
    </item>
    <item>
      <title>Hi Mikhail,</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945049#M18047</link>
      <description>Hi Mikhail,

&amp;gt;&amp;gt;...What important is that my own naive parallelization for single-threaded libraries works faster than DMIP linked
&amp;gt;&amp;gt;against both single- and multi-threaded libraties...

This is possibly because your codes have less overheads or partitioned a data set in a right way ( you know that all these cache related issues could significantly affect performance ).</description>
      <pubDate>Thu, 16 May 2013 13:28:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945049#M18047</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-05-16T13:28:01Z</dc:date>
    </item>
    <item>
      <title>Sergey,</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945050#M18048</link>
      <description>&lt;P&gt;Sergey,&lt;/P&gt;
&lt;P&gt;That's it, I exptected DMIP will partition a data in the most effective way. It is said, it knows the size of caches and all the hardware stuff.&lt;/P&gt;
&lt;P&gt;My point is, I afraid to use DMIP in my projects after these results. And I hoped I was doing something wrong.&lt;/P&gt;</description>
      <pubDate>Thu, 16 May 2013 13:52:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945050#M18048</guid>
      <dc:creator>Mikhail_Matrosov</dc:creator>
      <dc:date>2013-05-16T13:52:07Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...My point is, I afraid to</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945051#M18049</link>
      <description>&amp;gt;&amp;gt;...My point is, I afraid to use DMIP in my projects after these results....

I don't think that DMIP is too popular compared to IPP and you know that additional software layers create additional overheads ( reduce performance ). That is why .NET applications are slower that pure C/C++ applications, etc.</description>
      <pubDate>Thu, 16 May 2013 14:07:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945051#M18049</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-05-16T14:07:00Z</dc:date>
    </item>
    <item>
      <title>Mikhail,</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945052#M18050</link>
      <description>&lt;P&gt;Mikhail,&lt;/P&gt;
&lt;P&gt;I am currently investigating this. The performance profile data is quite strange like this (not sure that the table will look good):&lt;/P&gt;
&lt;P&gt;Top Hotspots&lt;BR /&gt;Function CPU &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Time&lt;BR /&gt;DMIP::Trace::Space &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;33.166s&lt;BR /&gt;own_ipps_sExp_G9LAynn &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;5.493s&lt;BR /&gt;g9_innerRGBToGray_8u_C3C1R &amp;nbsp; &amp;nbsp;3.551s&lt;BR /&gt;[dmip-1.5.dll] &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2.947s&lt;BR /&gt;ippGetCpuFreqMhz &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2.325s&lt;BR /&gt;[Others] &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 11.318s&lt;/P&gt;
&lt;P&gt;I will check what's going wrong.&lt;/P&gt;
&lt;P&gt;Regards,&lt;BR /&gt;Sergey&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 16 May 2013 15:27:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945052#M18050</guid>
      <dc:creator>Sergey_K_Intel</dc:creator>
      <dc:date>2013-05-16T15:27:51Z</dc:date>
    </item>
    <item>
      <title>Hi Mikhail,</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945053#M18051</link>
      <description>&lt;P&gt;Hi Mikhail,&lt;/P&gt;
&lt;P&gt;I found the nature of issue. DMIP.dll linked to your application is statically linked with IPP 7.0.x (so, it contains the code of IPP 7.0.x), while your separated IPP calls are linked to newer 7.1 library, which probably contains better optimized functions that you use. This is why manually parallelized IPP functions pipeline works better.&lt;/P&gt;
&lt;P&gt;I have linked DMIP object files with IPP 7.1 into other DMIP.DLL and this combination shows exactly the same results as your parallelized IPP calls.&lt;/P&gt;
&lt;P&gt;If you ask me when DMIP.dll, linked with IPP 7.1, will be released, I won't answer. Currently, DMIP project future is under consideration. Do you feel that this image processing implementation has some potential?&lt;/P&gt;
&lt;P&gt;Regards,&lt;BR /&gt;Sergey&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 28 May 2013 13:06:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945053#M18051</guid>
      <dc:creator>Sergey_K_Intel</dc:creator>
      <dc:date>2013-05-28T13:06:59Z</dc:date>
    </item>
    <item>
      <title>Dear Sergey,</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945054#M18052</link>
      <description>&lt;P&gt;Dear Sergey,&lt;/P&gt;
&lt;P&gt;Thank you for your thorough investigation on the issue!&lt;/P&gt;
&lt;P&gt;I think DMIP will be popular only in case it will provide a simple and transparent interface, so the raltion of simplicity to performance will bit the one for manual parallelization and GPU techniques. It easier for us to use IPP instead of manual arithmetics and it's much faster. And it is way easier than to go to GPU.&lt;/P&gt;
&lt;P&gt;I'm sure DMIP will become very popular if integrated into OpenCV's cv::Mat integrated arithmetic resolving system. They already automatically construct a graph based on the very simple and intuitive operator overloading patterns. Like D = (A + B) * C. For now, we are not using OpenCV because it lacks integration with IPP and internal parallelization. But it doesn't look like a tricky task to resolve both of these issues.&lt;/P&gt;</description>
      <pubDate>Tue, 28 May 2013 14:03:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945054#M18052</guid>
      <dc:creator>Mikhail_Matrosov</dc:creator>
      <dc:date>2013-05-28T14:03:19Z</dc:date>
    </item>
    <item>
      <title>Thank you for valuable</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945055#M18053</link>
      <description>&lt;P&gt;Thank you for valuable thoughts! This will be definitely taken into account.&lt;/P&gt;
&lt;P&gt;Best regards,&lt;BR /&gt;Sergey&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 28 May 2013 15:15:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945055#M18053</guid>
      <dc:creator>Sergey_K_Intel</dc:creator>
      <dc:date>2013-05-28T15:15:57Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...For now, we are not</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945056#M18054</link>
      <description>&amp;gt;&amp;gt;...For now, we are not using &lt;STRONG&gt;OpenCV&lt;/STRONG&gt; because it lacks integration with IPP and &lt;STRONG&gt;internal parallelization&lt;/STRONG&gt;...

&lt;STRONG&gt;OpenCV&lt;/STRONG&gt; is a very old ( 12+ years ) library and was not designed to do processing in parallel. It simply wasn't a project objective.</description>
      <pubDate>Wed, 29 May 2013 13:49:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/Why-is-IPP-DMIP-slower-than-Concurrency-parallel-for/m-p/945056#M18054</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-05-29T13:49:38Z</dc:date>
    </item>
  </channel>
</rss>

