Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

ippConv; Separable filter

C_W_
Beginner
591 Views

Does ippiConv internally perform a separable filter if the kernel parameters allow it?

I have implemented convolution using both ippiConv and ippiFilterRowBorderPipeline_32f_C1R, ippiFilterColumnPipeline_32f_C1R. I have implemented convolution using the above as both a single threaded version and multi-threaded (by breaking the convolution up into chunks).

In all cases ippiConv is faster than the by calling the ippiFilterRow/Column pair.

I didn't expect ippiConv to handle the separable case. I expected the  ippiFilterRow/Column pair to be faster. 

I am wondering if I did something wrong, or this is expected (outputs are numerically the same so raw implementation is correct).

I'm using IPP 8. Convolutions are perhaps 512x512 pixels, float. 4 core i7 CPU.

Thanks.

0 Kudos
4 Replies
Igor_A_Intel
Employee
591 Views

Hello,

Could you be more specific: OS, IPP static or dynamic, multi or single threaded, ia32 or x64, other parameters of convolution - size of both convolved images (are they both 512x512?), data type, number of channels, Full or Valid (better - full function name, all parameters used and output from ippiGetLibVersion: const IppLibraryVersion* lib = ippcvGetLibVersion(); printf(“%s %s %d.%d.%d.%d\n”, lib->Name, lib->Version, lib->major, lib->minor, lib->majorBuild, lib->build);).

ippiConv in 8.x internally uses some complex criterion and switches between 2 implementations - direct and based on convolution theorem (FFT). FFT-based version is implemented by chunks (if size of kernel is significantly less than image). And I think that if kernel size is rather small (3x3 - 11x11) it's better to use ippiFilter function.

regards, Igor

0 Kudos
C_W_
Beginner
591 Views

Windows 7, IPP dynamic (custom dll), single threaded, x64.

Approx image parameters; source image; various between 256x256 and 512x512 (however, aspect is not necesarily square, but both x and y are at least mod 8. Convolution kernel is square between 3x3 and 7x7. Kernels are guaranteed to be separable and square. Type is float-32, 1 channel of data. Valid convolution.

IPP Info; ippCV AVX (e9) 8.2.1 (r44077) 8.2.1.44077

I am using ippAlgDirect as I found that ippAlgFFT is slower.

For example; I am using

ippiConv_32f_C1R(x, 516 * sizeof(float), {516, 77}, x, 5 * sizeof(float), {5,5}, x, 516 * sizeof(float), ippiROIValid | ippAlgDirect | ippiNormNone, x);

-----------

An update; I have tried using ippiFilter and the timing results are almost identical to ippiConv. This would suggest that ippiConv is not internally determining separability. This would suggest that there is some optimizations here that could be made and as such I thought that by using ippiFilterRowBorderPipeline_32f_C1R i might see some improvement.

--------

I have a stand alone project that I could clean up and provide if you think it would be helpful.

Thanks,

 

0 Kudos
Igor_A_Intel
Employee
591 Views

ok, ippiConv with "Valid" ROI uses the same code as ippiFilter for "direct" case as both perform absolutely the same things. FFT-based convolution begins to be faster than direct for kernel sizes greater than ~20x20 (depends on arch). You are right - in your case the "separable" approach must be faster than direct 2D. I'll try to check the separable row-column algorithm for your sizes to see if there are any problems.

regards, Igor

0 Kudos
C_W_
Beginner
591 Views

Ok. Here is how I use the separable functions. Row then column. When I multithread these, I separate the filter processing into 1 line segments (ie roiSize = {512, 1}), i do all the rows first, then do all the columns.

ippiFilterRowBorderPipeline_32f_C1R(
        pfImgIn,
        iXRes * sizeof(float),        // 512 * 4
        ppfBufferRow,                  // Ipp32f **ppfBufferRow
        roiSize,                            // ie {512, 512}
        m_pfKernelRow,              // ie {1,2,3,4,5}
        m_iFilterSize,                  // 5
        iAnchor,                           // 2
        IppiBorderType::ippBorderConst,
        0, 
        pbtTempBuffer);

status = ippiFilterColumnPipeline_32f_C1R(
        ppfBufferRow,
        pfImgOut,
        iXRes * sizeof(float),
        roiSize,
        m_pfKernelColumn,
        m_iFilterSize,
        pbtTempBuffer);

 

0 Kudos
Reply