Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

ippsFIRSRGetSize results in extremely large bufSize

Holm-Rasmussen__Bo
744 Views

Hi,

Using IPPS version 2018 update 3 and 2019 update 1, both with the same result for the following call.

ippsFIRSRGetSize (TAPS_LEN,  ipp32f ,  &specSize,  &bufSize );

No matter what size TAPS_LEN the bufSize is >32k. This is an extremely large buffer for e.g. a 4 tap FIR filter. Both specSize and bufSize is of type int as documentation says. The general purpose IIR filter of the same order takes up much less memory.

Is this an error in IPPS? Or what could the reason be?

0 Kudos
6 Replies
Andrey_B_Intel
Employee
744 Views

Hi Bo.

Many customers of IPP needs so named in-place mode of functions(pDst=pSrc) when source and destination vector is the same by some reasons. To process properly this situation and store temporal data FIRSR needs about ~32K(L1 size) in reserved buffer. The API of ippsFIRSRGetSize does not have information about re-place or in-place mode and requests maximum buffer size.

Thanks.

 

0 Kudos
Blum__Troels
Beginner
744 Views

Hi Andrey,

Is there a work around for this. We are not using inline processing. Bu we are working with a hard limit of < 32K for mallocs. The reason is that we are working in an MS APO context, so we MUST use AERT_Allocate to allocate memory - which is limited to 32K. In addition a few bytes are wasted due to Ipp's memory alignment requirements.

https://docs.microsoft.com/en-us/windows/desktop/api/baseaudioprocessingobject/nf-baseaudioprocessingobject-aert_allocate

Best regards

Troels Blum

 

0 Kudos
Holm-Rasmussen__Bo
744 Views

Hi Andrey, thank you for your answer.

Just to inform you, Troels Blum is my colleague and I join his question.

//Bo

0 Kudos
Igor_A_Intel
Employee
744 Views

Hi Bo and Troels,

How many taps do you use?

ippsFIR internaly, in addition to "inplace" mode support, has at least 3 different algorithm implementations: for rather small filter orders (criterion also depends on cpu arch) ~<32 it uses so called "vertical" unrolling, for ~32- ~64 - so called "horizontal" unrolling, and, then, for higher filter orders - FFT (convolution theorem) based algorithm. I guess it's clear that the last one also requires more memory for internal buffers than the first two.

regards, Igor

0 Kudos
Gennady_F_Intel
Moderator
744 Views

Hello  Bo and Troels,

in the case if you are really interesting into this feature implementation, Could you submit this feature request to the Intel online service center?  

 

0 Kudos
Holm-Rasmussen__Bo
744 Views

Hi Igor and Gennady F.,

Thank you for your answers. We will consider our options and get back to you if it is relevant.

//Bo

0 Kudos
Reply