Question on Gaussian convolution: 2d vs 1d 2-pass

Hi, experimenting with Gaussian blur the 3x3 kernel in ippiFilterGauss (per-documentation) is:
1/16, 2/16, 1/16,
2/16, 4/16, 2/16,
1/16, 2/16, 1/16
which has 1D equivalent of:
[1/4, 2/4, 1/4]
By convoluting 2x (horiz w/ ippiFilterRow32f, then the result of 1st convolution vertically w/ ippiFilterColumn32f) I should get the same result as convoluting 1x with 2D kernel (ippiFilterGauss/IppiFilter32); it should also be faster.
But my results show that there are differences between the two. I am unsure if I am doing it wrongly, especially at the 2nd border extension.
[cpp]// extend border via replication: ext1 is the returned border extended img; topExtend = 1, leftExtend = 1 IppStatus borderExtend(Ipp8u* img, IppiSize roi, int lineStep, Ipp8u** ext1, int& extStep1, Ipp8u** startPt, int width, int height, int numChannels, int topExtend, int leftExtend) { int extWidth = width + leftExtend * 2; int extHeight = height + topExtend * 2; IppStatus status; IppiSize extRoi = { extWidth, extHeight }; *ext1 = ippiMalloc_8u_C3(extWidth, extHeight, &extStep1); // shift i.e. for 3 channels interleaved, 1 extra pixel to left, 1 extra pixel down => + (1 * step) + 3 // + 3 as bmp is interleaved, need to shift by another 3 bytes over to get to next pixel *startPt = *ext1 + (leftExtend * extStep1) + numChannels; // copy over from buffer in image to ext1 status = ippiCopy_8u_C3R(img, lineStep, *startPt, extStep1, roi); // extend by n pixel on each side status = ippiCopyReplicateBorder_8u_C3IR(*startPt, extStep1, roi, extRoi, topExtend, leftExtend); return status; }[/cpp] [cpp]IppStatus blur(Ipp8u* img, IppiSize roi, int lineStep, int width, int height, int numChannels, int topExtend, int leftExtend) { IppStatus status = ippStsNoErr; int ext1Line; Ipp8u* extend1 = NULL; Ipp8u* startPt1 = NULL; // extend border; output: extend1 status = borderExtend(img, roi, lineStep, &extend1, ext1Line, &startPt1, width, height, numChannels, topExtend, leftExtend); // temp holding buffer for results after 1st pass int tempLine; Ipp8u* tempBuffer = ippiMalloc_8u_C3(width, height, &tempLine); Ipp32f kernel[] = {1/4.0f, 2/4.0f, 1/4.0f}; // filter horiz with kernel; output: tempBuffer status = ippiFilterRow32f_8u_C3R(startPt1, ext1Line, tempBuffer, tempLine, roi, kernel, 3, 1); // extend again; output: extend2 int ext2Line; Ipp8u* extend2 = NULL; Ipp8u* startPt2 = NULL; status = borderExtend(tempBuffer, roi, tempLine, &extend2, ext2Line, &startPt2, width, height, numChannels, 1, 1); // filter vert, output: img status = ippiFilterColumn32f_8u_C3R(startPt2, ext2Line, img, lineStep, roi, kernel, 3, 1); return status; }[/cpp] My prog needs to handle different Gaussian kernels, hence the need to to 2-pass 1D convolution to maximize execution speed.
For the following code:
>*startPt = *ext1 + (leftExtend * extStep1) + numChannels;

is this something like?
>*startPt = *ext1 + (topExtend * extStep1) + numChannels*leftExternd;

I do not find much other problem? If you have some runable code, that may be helpful to reproduce the problem easily.


Thanks for looking thru the code, my code would have failed for border extensions > 1 pixel.
But after making the corrections, my 1D result differs from the 2D result (via an img diff program).
I have attached the input bitmap (testa.bmp), the resulting output (testoutg2d.bmp - using ippFilterGauss,testoutg1d.bmp - using FilterRow32f, filterColumn32f) and the src code.

(Not sure if I have attached files correctly)
