<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>тема ok, ippiConv with &amp;quot;Valid&amp;quot; ROI в Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippConv-Separable-filter/m-p/1044387#M23874</link>
    <description>&lt;P&gt;ok, ippiConv with "Valid" ROI uses the same code as ippiFilter for "direct" case as both perform absolutely the same things. FFT-based convolution begins to be faster than direct for kernel sizes greater than ~20x20 (depends on arch). You are right - in your case the "separable" approach must be faster than direct 2D. I'll try to check the separable row-column algorithm for your sizes to see if there are any problems.&lt;/P&gt;

&lt;P&gt;regards, Igor&lt;/P&gt;</description>
    <pubDate>Fri, 06 Nov 2015 14:51:00 GMT</pubDate>
    <dc:creator>Igor_A_Intel</dc:creator>
    <dc:date>2015-11-06T14:51:00Z</dc:date>
    <item>
      <title>ippConv; Separable filter</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippConv-Separable-filter/m-p/1044384#M23871</link>
      <description>&lt;P&gt;Does ippiConv internally perform a separable filter if the kernel parameters allow it?&lt;/P&gt;

&lt;P&gt;I have implemented convolution using both ippiConv and&amp;nbsp;ippiFilterRowBorderPipeline_32f_C1R,&amp;nbsp;ippiFilterColumnPipeline_32f_C1R. I have implemented convolution using the above as both a single threaded version and multi-threaded (by breaking the convolution up into chunks).&lt;/P&gt;

&lt;P&gt;In all cases ippiConv is faster than the by calling the ippiFilterRow/Column pair.&lt;/P&gt;

&lt;P&gt;I didn't expect ippiConv to handle the separable case. I expected the&amp;nbsp;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;&amp;nbsp;ippiFilterRow/Column pair to be faster.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I am wondering if I did something wrong, or this is expected (outputs are numerically the same so raw implementation is correct).&lt;/P&gt;

&lt;P&gt;I'm using IPP 8. Convolutions are perhaps 512x512 pixels, float. 4 core i7 CPU.&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Mon, 02 Nov 2015 23:38:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippConv-Separable-filter/m-p/1044384#M23871</guid>
      <dc:creator>C_W_</dc:creator>
      <dc:date>2015-11-02T23:38:00Z</dc:date>
    </item>
    <item>
      <title>Hello,</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippConv-Separable-filter/m-p/1044385#M23872</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;Could you be more specific: OS, IPP static or dynamic, multi or single threaded, ia32 or x64,&amp;nbsp;other parameters of convolution -&amp;nbsp;size of&amp;nbsp;both convolved images (are they both 512x512?), data type, number of channels, Full or Valid (better - full function name, all parameters used and output from ippiGetLibVersion: const IppLibraryVersion* lib = ippcvGetLibVersion(); printf(“%s %s %d.%d.%d.%d\n”, lib-&amp;gt;Name, lib-&amp;gt;Version, lib-&amp;gt;major, lib-&amp;gt;minor, lib-&amp;gt;majorBuild, lib-&amp;gt;build);).&lt;/P&gt;

&lt;P&gt;ippiConv in 8.x internally uses some complex criterion and switches between 2 implementations - direct and based on convolution theorem (FFT). FFT-based version is implemented by chunks (if size of kernel is significantly less than image). And I think that if kernel size is rather small (3x3 - 11x11) it's better to use ippiFilter function.&lt;/P&gt;

&lt;P&gt;regards, Igor&lt;/P&gt;</description>
      <pubDate>Thu, 05 Nov 2015 08:01:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippConv-Separable-filter/m-p/1044385#M23872</guid>
      <dc:creator>Igor_A_Intel</dc:creator>
      <dc:date>2015-11-05T08:01:23Z</dc:date>
    </item>
    <item>
      <title>Windows 7, IPP dynamic</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippConv-Separable-filter/m-p/1044386#M23873</link>
      <description>&lt;P&gt;Windows 7, IPP dynamic (custom dll), single threaded, x64.&lt;/P&gt;

&lt;P&gt;Approx image parameters; source image; various between 256x256 and 512x512 (however, aspect is not necesarily square, but both x and y are at least mod 8. Convolution kernel is square between 3x3 and 7x7. Kernels are guaranteed to be separable and square. Type is float-32, 1 channel of data. Valid convolution.&lt;/P&gt;

&lt;P&gt;IPP Info;&amp;nbsp;ippCV AVX (e9) 8.2.1 (r44077) 8.2.1.44077&lt;/P&gt;

&lt;P&gt;I am using&amp;nbsp;ippAlgDirect as I found that&amp;nbsp;ippAlgFFT is slower.&lt;/P&gt;

&lt;P&gt;For example; I am using&lt;/P&gt;

&lt;P&gt;ippiConv_32f_C1R(x, 516 * sizeof(float), {516, 77}, x, 5 * sizeof(float), {5,5}, x, 516 * sizeof(float),&amp;nbsp;ippiROIValid |&amp;nbsp;ippAlgDirect | ippiNormNone, x);&lt;/P&gt;

&lt;P&gt;-----------&lt;/P&gt;

&lt;P&gt;An update; I have tried using ippiFilter and the timing results are almost identical to ippiConv. This would suggest that ippiConv is not internally determining separability. This would &lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;suggest&lt;/SPAN&gt; that there is some optimizations here that could be made and as such I thought that by using&amp;nbsp;ippiFilterRowBorderPipeline_32f_C1R i might see some improvement.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;--------&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I have a stand alone project that I could clean up and provide if you think it would be helpful.&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 05 Nov 2015 21:53:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippConv-Separable-filter/m-p/1044386#M23873</guid>
      <dc:creator>C_W_</dc:creator>
      <dc:date>2015-11-05T21:53:35Z</dc:date>
    </item>
    <item>
      <title>ok, ippiConv with "Valid" ROI</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippConv-Separable-filter/m-p/1044387#M23874</link>
      <description>&lt;P&gt;ok, ippiConv with "Valid" ROI uses the same code as ippiFilter for "direct" case as both perform absolutely the same things. FFT-based convolution begins to be faster than direct for kernel sizes greater than ~20x20 (depends on arch). You are right - in your case the "separable" approach must be faster than direct 2D. I'll try to check the separable row-column algorithm for your sizes to see if there are any problems.&lt;/P&gt;

&lt;P&gt;regards, Igor&lt;/P&gt;</description>
      <pubDate>Fri, 06 Nov 2015 14:51:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippConv-Separable-filter/m-p/1044387#M23874</guid>
      <dc:creator>Igor_A_Intel</dc:creator>
      <dc:date>2015-11-06T14:51:00Z</dc:date>
    </item>
    <item>
      <title>Ok. Here is how I use the</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/ippConv-Separable-filter/m-p/1044388#M23875</link>
      <description>&lt;P&gt;Ok. Here is how I use the separable functions. Row then column. When I multithread these, I separate the filter processing into 1 line segments (ie roiSize = {512, 1}), i do all the rows first, then do all the columns.&lt;/P&gt;

&lt;P&gt;ippiFilterRowBorderPipeline_32f_C1R(&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;pfImgIn,&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;iXRes * sizeof(float), &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;// 512 * 4&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;ppfBufferRow, &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;// Ipp32f **ppfBufferRow&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;roiSize, &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;// ie {512, 512}&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;m_pfKernelRow, &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;// ie {1,2,3,4,5}&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;m_iFilterSize, &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;// 5&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;iAnchor, &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; // 2&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;IppiBorderType::ippBorderConst,&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;0,&amp;nbsp;&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; pbtTempBuffer);&lt;/P&gt;

&lt;P&gt;status = ippiFilterColumnPipeline_32f_C1R(&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;ppfBufferRow&lt;/SPAN&gt;,&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;pfImgOut,&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;iXRes * sizeof(float),&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;roiSize,&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;m_pfKernelColumn,&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;m_iFilterSize,&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;SPAN style="font-size: 13.008px; line-height: 19.512px;"&gt;pbtTempBuffer);&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Nov 2015 15:26:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/ippConv-Separable-filter/m-p/1044388#M23875</guid>
      <dc:creator>C_W_</dc:creator>
      <dc:date>2015-11-06T15:26:22Z</dc:date>
    </item>
  </channel>
</rss>

