I'm currently checking IPP 8 and it is saying that many functions related to memory allocation are deprecated. I understand the point, so I'm checking the ways to get the memory allocations done.
ippsIIRGetStateSize_BiQuad_32f(1, &sz); => 4576!!!
ippsIIRGetStateSize_BiQuad_64f(1, &sz); => 8560!!!
ippsIIRGetStateSize64f_BiQuad_32f(1, &sz); => 16752!!!
I just don't understand. I want a single biquad IIR filter, where the state is technically 2 numbers, hence 8 bytes for 32-bit floats. If it stores something to fill the XMM registers, plus lots of precomputed stuff from the 6 arguments an IIR filter technically has, we can get to say 64 bytes, I'd survive even 256 thinking that it is really storing lots of stuff. But 4576 bytes??? That honestly looks like a bug.
What am I missing?
ADDITIONAL QUESTION: Does this memory need to be allocated dynamically in a special buffer using ippMalloc or something? Or can we just place it in some other function? Is it possible to preestimate its maximum size, so we can avoid the allocations totally by placing it into our own structures?
Guess I've answered in the "How to use IIR functions for variable filters with multiple states? " topic.
As regarding pre-estimation - the size of state is a sum of 4 addendums: unrolled B taps, recalculated and unrolled A taps, delay line and a buffer for in-place case + separate processing by B and A for out-of-place case, also it is used for high-precision calculations in case of mixed data types (64f taps for 32f vector). This size may vary in dependence on arch - for SSE unrolling factor for taps is 4, for AVX -8, for AVX512-16 (in case of 32f taps). Therefore there is no way to pre-estimate precisely - as IPP library should support all known Intel architectures with the best possible performance - there always will be differences between realizations for specific CPUs, differences between realizations in different IPP versions - so you should not relay on any pre-estimation and should use the documented way - GetSize, allocate required memory buffer, then Init, then processing function - only this way guarantee correct execution and appropriate results.
Thanks again Igor for the reply. I think we should move it all here:
This post kind of explained the whole thing - IPP just assumes several taps, while the most common biquads just have 2 and you can create any filter with biquad stages. Anyway I explained the whole thing in the other topic.
Just one more thing - when I was checking the assembly in the debugger, I found that on a Haswell i7 it was indeed using AVX2, but in Sandy bridge i7, which doesn't have support for AVX2, but has support for AVX the highest instruction set used was SSE2 (at least from a brief look). Again don't get me wrong, I'm not complaining, just checking if there isn't something I should have enabled or anything. (I used ippInit of course).