Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.

(ippiMalloc_XXX | malloc) & ippiFilterGauss*

m_a_
Beginner
2,183 Views

Hello.

Do I need to allocate memory for arrays only with ippiMalloc (it allocates 32-bytes aligned memory) or not?

Does using other memory allocating functions (malloc, ...) affects to performance?

Code:

IppStatus GaussFilter (Ipp32f* pSrc, const int nWidth, const int nHeight, const Ipp32f fSigma, Ipp32f* pDst)

{

  int nKernelSize = 7;

  IppiSize tWholePic = {nWidth, nHeight};

 

  int nStepBytes = 0;

  pDst = ippiMalloc_32f_C1 (nWidth, nHeight, &nStepBytes);

 

  int nBorderBufferSize = 0;

  Ippi8u* pBorderBuffer = ippiFilterGaussGetBufferSetSize_32f_C1R (tWholePic, nKernelSize, &nBorderBufferSize);

 

  ippiFilterGaussBorder_32f_C1R (pSrc, nWidth * sizeof (Ipp32f)

                                               , pDst, nWidth * sizeof (Ipp32f)  // Have I use this one or nStepBytes receipt from ippiMalloc?

                                               , tWholePic

                                               , nKernelSize, fSigma, ippBorderRepl, 0.

                                               , pBorderBuffer);

}

void tmain(...)

{

  int nWidth = 12, nHeight = 15;

  Ipp32f fSigma = 1.;

  Ipp32f pSrc[nWidth * nHeight] ;

  Ipp32f* pDst = NULL;

 // Initialize pSrc

  GaussFilter (pSrc, nWidth, nHeight, pDst);

// Do something with pDst.

  if (pDst != NULL)

    ippFree (pDst);

}

Regards,

Mark

 

0 Kudos
12 Replies
Chao_Y_Intel
Moderator
2,183 Views

Hello,

 

Other malloc functions also work. ippsMalloc is actually calling the system malloc function, and make the memory 32 bit/64bit alignment.  From the performance point, it is the better if the input data is address is 32bit or 64 bit alignment( for the machine support AVX instructions).

 

Thanks,
Chao

0 Kudos
Sergey_K_Intel
Employee
2,183 Views

Hi,

In the source code, where you ask "Have I use...", you need to use nStepBytes, because the step is not always equal to nWidth*sizeof. Otherwise, there is a risk of missing of memory alignment benefits.

Regards,
Sergey

0 Kudos
m_a_
Beginner
2,183 Views

Thanks.

 

regards,

Mark.

0 Kudos
m_a_
Beginner
2,183 Views

Hello, guys.

I have one more question relates to above theme. :) 

If I use memory aligning, how can I define memory with real data and trash memory (allocated to align 32/64 boundary)? Are there some helper functions to define neccessary and trash mem? Are other functions know about the trash memory? It seems that this done by nStepbytes parameter, isn't it?

const int nHeight = 15, nWidth = 8;

IppiSize tsWholePic = {nWidth, nHeight};

 

Ipp32f pSrc[nHeight * nWidth];

ippiSet_32f_C1R (2., pSrc, nWidth * sizeof (Ipp32f), tsWholePic);

 

int nDstStepBytes = 0;

Ipp32f* pDst = ippiMalloc_32f_C1R (nWidth, nHeight, &nDstStepBytes); // In due of mem align pDst has trash memory parts. See pic

ippiCopy_32f_C1R (pSrc, nWidth * sizeof (Ipp32f), pDst, nDstStepBytes, tsWholePic); // Is this right copying? Step bytes are different  // for pSrc and pDst
Regards,
Mark
 
 
0 Kudos
Sergey_K_Intel
Employee
2,183 Views

Mark,

That's correct. Any "step_bytes" parameter in IPP image processing function defines how many bytes to add to the beginning of previous image row to position to the beginning of next image row. So, "nWidth*sizeof(Ipp32f)" and "nDstStepBytes" both are correct as src and dst steps.

Regards,
Sergey

0 Kudos
m_a_
Beginner
2,183 Views

I see. Payment for speed and comfort. :)

Merry Christmass,

Thanks a lot,

Mark

0 Kudos
m_a_
Beginner
2,183 Views

hello

I see. Thanks a lot.

Merry Christmas, guys (yesterday I couldn't add post to the forum, something happened with site.)

Regards,

Mark

0 Kudos
m_a_
Beginner
2,183 Views

Ok, thanks a lot.

regards,

Mark

0 Kudos
m_a_
Beginner
2,183 Views

Hello,

As I understand, in ippiMallocated structures any row in image is aligned to 32/64 border, so bytes are added to the end of the previous row. And the situation seems follow: for static and dynamic (allocated with malloc) I can use nWidth * sizeof (Type_Of_Array). For dynamic arrays, allocated with ippiMalloc I have to use nStepbytes.

schema in ippiMallocated array

xxxx1............a............b............c............xxxxxxxxxx2............a............b............c............ixxxxxxxxxx

1, 2 - address in memory aligned to 32/64

xxxxx - trash memory, added to align

2 - 1 = nStepBytes,

&c - 1 = nWidth * sizeof(Type_Of_Array)

regards,

Mark

0 Kudos
m_a_
Beginner
2,183 Views

Hello,

Thanks a lot.

Regards,

Mark

0 Kudos
Sergey_K_Intel
Employee
2,183 Views

Mark,

There are aligned mallocs in various OSes (_align_malloc, posix_memalign and others), but they provide only alignment of the very first byte of allocated memory, whereas in image processing the beginnings of each image line should be aligned for better performance.

Regards,
Sergey

0 Kudos
m_a_
Beginner
2,183 Views

I see. I have made a lot of tests and discovered this feature of ippiMalloc. :) 
 

Thank a lot.

best regards,

Mark.
 

0 Kudos
Reply