problem with non power of 2 data sizes

siddy · ‎02-08-2012

Hi everyone
could i just quickly confirm:
with a 3x3 kernel, origin at {1,1}, and a 32f data (ippiMalloc) of size nSamplesAxial x nSamplesLateral = {1024,300} , are the following constructs correct:
//----------------------------------------------------------------

IppiSize KernelSize = {3,3};
IppiPoint KernelAnchor = {1,1};
Ipp32f kernel3x3c1[] = {-0.1250F,-0.1250F,-0.1250F, -0.1250F, 1.0000F, -0.1250F, -0.1250F, -0.1250F, -0.1250F};

IppiSize size;
size.width = nSamplesAxial;
size.height = nSamplesLateral;
Ipp32f *pKernel = &kernel3x3c1[0];
int spillHeight = 1;
int spillWidth = 1;
int strideBorder;
int stride32f;

Ipp32f *convertedFrame = ippiMalloc_32f_C1(nSamplesAxial, nSamplesLateral, &stride32f);
Ipp32f *filteredField = ippiMalloc_32f_C1(size.width + 2 * spillWidth,
size.height + 2*spillHeight,
&strideBorder);
IppiSize dstRoi = {nSamplesAxial + 2 * spillWidth, nSamplesLateral+ 2*spillHeight};
IppiSize roi = {nSamplesAxial, nSamplesLateral};
IppStatus st;
// copy const construct ?
st = ippiCopyConstBorder_32f_C1R(convertedFrame,
nSamplesAxial * sizeof(Ipp32f),
roi,
filteredField,
(nSamplesAxial + 2 * spillWidth) * sizeof(Ipp32f),
dstRoi, spillWidth, spillHeight, 0);

// and filter construct ?

st = ippiFilter_32f_C1R(pROI, (nSamplesAxial + 2 * spillWidth)*sizeof(Ipp32f),
fFrame, stride32f, roi, pKernel, KernelSize, KernelAnchor);

//--------------end code---------------

The code in which this appears works for data which are powers of 2 (1024 x 1024, 1024 x 512 etc), but fails when they are not (I get strange dark streaks in the result...). What am i messing up?

Thanks,
Sid

Ying_H_Intel · ‎02-08-2012

Hi Sid,

It seems the problem of stepBytes.

As the article http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-processing-an-image-from-edge-to-edge/mentioned,

It is the distance in bytes of imagerow.It depends on your array memory layout and datatype.In most of case,it isequal to the image Width*sizeof(datatype)*Channel. But sometimes, it is not, especially forbmp image and ippMalloc, which required 4bytes aligned and 32 bytes aligned correspondingly. there arepadded zero at the end of row. so please take care when use stepBystes or shift the pointer by stepBystes.

For example nSamplesAxial=3, nSamplesAxial * sizeof(Ipp32f) = 3*4=12, but the real step of convertedFrame is 32 because ippiMalloc_32_C1 is 32 bytes aligned.

So you may replace all of the steps like nSamplesAxial * sizeof(Ipp32f),(nSamplesAxial + 2 * spillWidth) * sizeof(Ipp32f)by stride32f and stridBorder.

Regards,
Ying