Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

Any problems casting as Ipp32f*?

lkeene
Beginner
1,887 Views

Hello all,

I'm getting strange crashes when deallocating memory and it occured to me it may be the way I'm casting my buffers to the IPP functions. For example, I'm allocating memory as follows:

float* SourceBuffer = (float*)_mm_malloc(BufferSize, 16);

float* TargetBuffer = (float*)_mm_malloc(BufferSize, 16);

etc...

When passing to an IPP function, I cast as follows:

ippResult =ippsAbs_32f((const Ipp32f*)SourceBuffer, (Ipp32f*)TargetBuffer, NumElements);

Is this kosher?

0 Kudos
9 Replies
Vladimir_Dudnik
Employee
1,887 Views

Hello,

correct use of _mm_malloc function should not cause issues. Are you sure you calculate correct BufferSize? It should be something like BufferSize = NumElements * sizeof(Ipp32f)

Regards,
Vladimir

0 Kudos
lkeene
Beginner
1,887 Views
I'm actually calculating BufferSize as (NumElements * 4). Ipp32f is 32-bits, no?
0 Kudos
Vladimir_Dudnik
Employee
1,887 Views

Yes, Ipp32f is 4 bytes single precision floating point data type so your calculationshould be fine. Could you please check that using IPP memory allocation (which provide alignment on 32-bytes boundary) does not cause issue?

You can try either ippMalloc(NumElements*sizeof(Ipp32f)) or ippsMalloc_32f(NumElements) instead of _mm_malloc call. Note you will need to use ippFree or ippsFree accordingly.

Regards,
Vladimir

0 Kudos
lkeene
Beginner
1,887 Views

Using the IPP allocations has no effect on the error. Still crashing at "_mm_free(Buffer)". Some buffers crash upon deallocations whereas others don't. I've noticed that the ones that are problematic are implicated with functions that require packing/unpacking, and these happen to be the ones where I was a little unsure of the arguments. For example, I've got the following:

int columns = 32;
int rows = 32;
IppiSize sizeOb;
sizeOb.height = rows;
sizeOb.width = columns;
int BytesPerScanline = columns * 4;
int BufferSize = rows * columns * 4;
IppiFFTSpec_R_32f* fftSpec;
IppResult = ippiFFTInitAlloc_R_32f(&fftSpec, Xorder, Yorder, IPP_FFT_DIV_FWD_BY_N, ippAlgHintFast);
int bufferSize;
IppResult = ippiFFTGetBufSize_R_32f(fftSpec, &bufferSize);
Ipp8u* ExternalFFTBuffer = (Ipp8u*)ippMalloc(bufferSize);
float* TempBuffer = (float*)_mm_malloc(BufferSize, 16);
float* SourceBuffer = (float*)_mm_malloc(BufferSize, 16);
float* UnpackBuffer = (float*)_mm_malloc(BufferSize * 2, 16);
.
.
.
// Load source buffer with data...
.
.
.
// Forward FFT:
ippResult = ippiFFTFwd_RToPack_32f_C1R((const Ipp32f*)SourceBuffer, BytesPerScanline, (Ipp32f*)TempBuffer, BytesPerScanline, (const IppiFFTSpec_R_32f*)transformSpec, externalFFTBuffer);
// Unpack to complex array:
ippResult = ippiPackToCplxExtend_32f32fc_C1R((const Ipp32f*)tempBuffer, sizeOb, BytesPerScanline, (Ipp32fc*)UnpackBuffer, BytesPerScanline*2);
.
.
.
_mm_free(TempBuffer); // Crash -> "CRT detected that the application wrote to memory after end of heap buffer"
_mm_free(UnpackBuffer); // Crash -> "CRT detected that the application wrote to memory after end of heap buffer"

This looks good to me. Have I misunderstood something in the docs?

0 Kudos
Vladimir_Dudnik
Employee
1,887 Views

Did you get my point about using appropriate ipp memorydeallocation function when you use ipp memory allocation ?

Vladimir

0 Kudos
lkeene
Beginner
1,887 Views
Yes, sorry...I meant to say still crashing even when using "ippsMalloc_32f /ippsFree()".
0 Kudos
Vladimir_Dudnik
Employee
1,887 Views

Well, did you pay attention to different size of source and destination images for ippiPackToCplxExtend function you use? According to IPP documentation:

PackToCplxExtend

Converts an image in packed format to a complex data image.

Syntax

IppStatus ippiPackToCplxExtend_32s32sc_C1R(const Ipp32s* pSrc, IppiSize srcSize, int srcStep, Ipp32sc* pDst, int dstStep);

IppStatus ippiPackToCplxExtend_32f32fc_C1R(const Ipp32f* pSrc, IppiSize srcSize, int srcStep, Ipp32fc* pDst, int dstStep);

Parameters

pSrc

Pointer to the source image ROI.

srcSize

Size in pixels of the source image ROI.

srcStep

Distance in bytes between starts of consecutive lines in the source buffer.

pDst

Pointer to the destination image buffer.

dstStep

Distance in bytes between starts of consecutive lines in the destination image buffer.
Description

The function ippiPackToCplxExtend is declared in the ippi.h file. It operates with ROI (see Regions of Interest in Intel IPP).

This function converts the source image pSrc in RCPack2D format to complex data format and stores the results in pDst, which is a matrix with complete set of the Fourier coefficients. Note that if the pSrc in RCPack2D format is a real array of dimensions (NxM), then the pDst is a real array of dimensions (2xNxM). This should be taken into account when allocating memory for the function operation.

Vladimir

0 Kudos
lkeene
Beginner
1,887 Views

Yes. Please note my posted code has:

TempBuffer = (float*)_mm_malloc(BufferSize, 16); // (M x N x NumBytesInFloat)

UnpackBuffer = (float*)_mm_malloc(BufferSize * 2, 16); // (M x N x 2 x NumBytesInFloat)

Anyway, I may have misunderstood the documentation regarding one crucial aspect. Consider the following prototype:

IppStatus ippsConj_32fc_I(Ipp32fc* pSrcDst, int len);

where the docs state "int len =Number of elements in the vector." This was a little ambiguous. I interpreted it as the sum of real and imaginary elements in the vector, i.e. I have

int rows = 32;

int columns = 32;

int NumberOfValues = rows * columns;

int NumberOfComplexElements = NumberOfValues * 2;

ippResult = ippsConj_32fc_I((Ipp32fc*)UnpackBuffer, NumberOfComplexElements);

Could I have misunderstood something here?

0 Kudos
Vladimir_Dudnik
Employee
1,887 Views

Hello,

IPP functions working with complex data type conisder elements as complex elements, so you should specify number of complex elements for ippsConj_32fs_I function as a lenght parameter (not sum of real and image parts).

Regards,
Vladimir

0 Kudos
Reply