Intel® Integrated Performance Primitives
Community support and discussions relating to developing high-performance vision, signal, security, and storage applications.
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.

ippsConv_32f slow for >64 buffer

ippsConv_32f seems to get magnitudes slower onceone of the source buffers is large enough& the otherbuffer has reached the length of 64. Is there something special happening for this case (sounds like an optimization that goes wrong, as the increase in CPU usage is really abrupt & quite bad), and is there a way to bypass this?

I know that at some point an FFT-based convolution would be faster, but I don't need to go so much above 64 & I don't wanna enter the complexity of an FFT-based one.
Using IPP 6.x btw.

0 Kudos
2 Replies
Hi Gol,

Are you run the test on multi-cores machines? could you tell the cpu usage and a rough performance data in two cases?

It is true thatippsConv_32fchange thealgorithm from direct convolution to FFT-based convolutionat some points. for example

if(( lenDst < 512)||( MIN(Src1Len,Src2Len) <64 ))

direct code
FFT code
because the FFT-based convolution would be faster when thelength data islarge enough. But as you see, the "critical point" isavalue byempirical test. Around the critical point, the performance advantagemay be wobbles.

Do you alwayscaculate the convolution of the source buffer is large and the second one keep 64 sowant to use direct algorithm?
just for your reference, the Direct Convolution is supported by MKL(IPP sister library) also.
For example, y=x(*)h, MKL call is like
vslConvSetStart(task, &iy0);
status = vslsConvExec1D(task, h,inch, x,incx, y,incy);

For more information, please see <<>>

Ying H

I test on a (Intel)quad,but my code (a synthesizer) is already multithreaded,voices being processed in parallel.
However I can also measurethe huge jump in CPU usage even when there's 1 single voice processed (meaning, no multithreading), by a factor of almost 5 (& it's pretty stable, not random spikes at all).

But: my bad, I had only tested within the compiler (Delphi) so far. Testing outside debugger, it's actually pretty good!
I know that Delphi's debugger does something lengthywhen threads are created (already noticed this in the past), so I'm assuming that threads are being created when the jump occurs.
So, false alarm, sry.