ippsFIRSRGetSize results in extremely large bufSize

Holm-Rasmussen__Bo · ‎01-18-2019

Hi,

Using IPPS version 2018 update 3 and 2019 update 1, both with the same result for the following call.

ippsFIRSRGetSize (TAPS_LEN,  ipp32f ,  &specSize,  &bufSize );

No matter what size TAPS_LEN the bufSize is >32k. This is an extremely large buffer for e.g. a 4 tap FIR filter. Both specSize and bufSize is of type int as documentation says. The general purpose IIR filter of the same order takes up much less memory.

Is this an error in IPPS? Or what could the reason be?

Andrey_B_Intel · ‎01-19-2019

Hi Bo.

Many customers of IPP needs so named in-place mode of functions(pDst=pSrc) when source and destination vector is the same by some reasons. To process properly this situation and store temporal data FIRSR needs about ~32K(L1 size) in reserved buffer. The API of ippsFIRSRGetSize does not have information about re-place or in-place mode and requests maximum buffer size.

Thanks.

Blum__Troels · ‎01-23-2019

Hi Andrey,

Is there a work around for this. We are not using inline processing. Bu we are working with a hard limit of < 32K for mallocs. The reason is that we are working in an MS APO context, so we MUST use AERT_Allocate to allocate memory - which is limited to 32K. In addition a few bytes are wasted due to Ipp's memory alignment requirements.

https://docs.microsoft.com/en-us/windows/desktop/api/baseaudioprocessingobject/nf-baseaudioprocessingobject-aert_allocate

Best regards

Troels Blum

Holm-Rasmussen__Bo · ‎01-28-2019

Hi Andrey, thank you for your answer.

Just to inform you, Troels Blum is my colleague and I join his question.

//Bo

Igor_A_Intel · ‎01-28-2019

Hi Bo and Troels,

How many taps do you use?

ippsFIR internaly, in addition to "inplace" mode support, has at least 3 different algorithm implementations: for rather small filter orders (criterion also depends on cpu arch) ~<32 it uses so called "vertical" unrolling, for ~32- ~64 - so called "horizontal" unrolling, and, then, for higher filter orders - FFT (convolution theorem) based algorithm. I guess it's clear that the last one also requires more memory for internal buffers than the first two.

regards, Igor

Gennady_F_Intel · ‎01-30-2019

Hello Bo and Troels,

in the case if you are really interesting into this feature implementation, Could you submit this feature request to the Intel online service center?

Holm-Rasmussen__Bo · ‎01-30-2019

Hi Igor and Gennady F.,

Thank you for your answers. We will consider our options and get back to you if it is relevant.

//Bo