Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.
6685 Discussions

Question on Gaussian convolution: 2d vs 1d 2-pass

Hi, experimenting with Gaussian blur the 3x3 kernel in ippiFilterGauss (per-documentation) is:
1/16, 2/16, 1/16,
2/16, 4/16, 2/16,
1/16, 2/16, 1/16
which has 1D equivalent of:
[1/4, 2/4, 1/4]
By convoluting 2x (horiz w/ ippiFilterRow32f, then the result of 1st convolution vertically w/ ippiFilterColumn32f) I should get the same result as convoluting 1x with 2D kernel (ippiFilterGauss/IppiFilter32); it should also be faster.
But my results show that there are differences between the two. I am unsure if I am doing it wrongly, especially at the 2nd border extension.
[cpp]// extend border via replication: ext1 is the returned border extended img; topExtend = 1, leftExtend = 1 IppStatus borderExtend(Ipp8u* img, IppiSize roi, int lineStep, Ipp8u** ext1, int& extStep1, Ipp8u** startPt, int width, int height, int numChannels, int topExtend, int leftExtend) { int extWidth = width + leftExtend * 2; int extHeight = height + topExtend * 2; IppStatus status; IppiSize extRoi = { extWidth, extHeight }; *ext1 = ippiMalloc_8u_C3(extWidth, extHeight, &extStep1); // shift i.e. for 3 channels interleaved, 1 extra pixel to left, 1 extra pixel down => + (1 * step) + 3 // + 3 as bmp is interleaved, need to shift by another 3 bytes over to get to next pixel *startPt = *ext1 + (leftExtend * extStep1) + numChannels; // copy over from buffer in image to ext1 status = ippiCopy_8u_C3R(img, lineStep, *startPt, extStep1, roi); // extend by n pixel on each side status = ippiCopyReplicateBorder_8u_C3IR(*startPt, extStep1, roi, extRoi, topExtend, leftExtend); return status; }[/cpp] [cpp]IppStatus blur(Ipp8u* img, IppiSize roi, int lineStep, int width, int height, int numChannels, int topExtend, int leftExtend) { IppStatus status = ippStsNoErr; int ext1Line; Ipp8u* extend1 = NULL; Ipp8u* startPt1 = NULL; // extend border; output: extend1 status = borderExtend(img, roi, lineStep, &extend1, ext1Line, &startPt1, width, height, numChannels, topExtend, leftExtend); // temp holding buffer for results after 1st pass int tempLine; Ipp8u* tempBuffer = ippiMalloc_8u_C3(width, height, &tempLine); Ipp32f kernel[] = {1/4.0f, 2/4.0f, 1/4.0f}; // filter horiz with kernel; output: tempBuffer status = ippiFilterRow32f_8u_C3R(startPt1, ext1Line, tempBuffer, tempLine, roi, kernel, 3, 1); // extend again; output: extend2 int ext2Line; Ipp8u* extend2 = NULL; Ipp8u* startPt2 = NULL; status = borderExtend(tempBuffer, roi, tempLine, &extend2, ext2Line, &startPt2, width, height, numChannels, 1, 1); // filter vert, output: img status = ippiFilterColumn32f_8u_C3R(startPt2, ext2Line, img, lineStep, roi, kernel, 3, 1); return status; }[/cpp] My prog needs to handle different Gaussian kernels, hence the need to to 2-pass 1D convolution to maximize execution speed.
0 Kudos
2 Replies


For the following code:
>*startPt = *ext1 + (leftExtend * extStep1) + numChannels;

is this something like?
>*startPt = *ext1 + (topExtend * extStep1) + numChannels*leftExternd;

I do not find much other problem? If you have some runable code, that may be helpful to reproduce the problem easily.


0 Kudos
Thanks for looking thru the code, my code would have failed for border extensions > 1 pixel.
But after making the corrections, my 1D result differs from the 2D result (via an img diff program).
I have attached the input bitmap (testa.bmp), the resulting output (testoutg2d.bmp - using ippFilterGauss,testoutg1d.bmp - using FilterRow32f, filterColumn32f) and the src code.

(Not sure if I have attached files correctly)
0 Kudos