Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

ippsConv_32f slow for >64 buffer

gol
Beginner
389 Views
ippsConv_32f seems to get magnitudes slower onceone of the source buffers is large enough& the otherbuffer has reached the length of 64. Is there something special happening for this case (sounds like an optimization that goes wrong, as the increase in CPU usage is really abrupt & quite bad), and is there a way to bypass this?

I know that at some point an FFT-based convolution would be faster, but I don't need to go so much above 64 & I don't wanna enter the complexity of an FFT-based one.
Using IPP 6.x btw.

Thanks
0 Kudos
2 Replies
Ying_H_Intel
Employee
389 Views
Hi Gol,

Are you run the test on multi-cores machines? could you tell the cpu usage and a rough performance data in two cases?

It is true thatippsConv_32fchange thealgorithm from direct convolution to FFT-based convolutionat some points. for example

if(( lenDst < 512)||( MIN(Src1Len,Src2Len) <64 ))

direct code
else
FFT code
because the FFT-based convolution would be faster when thelength data islarge enough. But as you see, the "critical point" isavalue byempirical test. Around the critical point, the performance advantagemay be wobbles.

Do you alwayscaculate the convolution of the source buffer is large and the second one keep 64 sowant to use direct algorithm?
just for your reference, the Direct Convolution is supported by MKL(IPP sister library) also.
For example, y=x(*)h, MKL call is like
vslsConvNewTask1D(&task,VSL_CONV_MODE_DIRECT,nh,nx,ny);
vslConvSetStart(task, &iy0);
status = vslsConvExec1D(task, h,inch, x,incx, y,incy);

For more information, please see <<>>

Regards,
Ying H

0 Kudos
gol
Beginner
389 Views
I test on a (Intel)quad,but my code (a synthesizer) is already multithreaded,voices being processed in parallel.
However I can also measurethe huge jump in CPU usage even when there's 1 single voice processed (meaning, no multithreading), by a factor of almost 5 (& it's pretty stable, not random spikes at all).

But: my bad, I had only tested within the compiler (Delphi) so far. Testing outside debugger, it's actually pretty good!
I know that Delphi's debugger does something lengthywhen threads are created (already noticed this in the past), so I'm assuming that threads are being created when the jump occurs.
So, false alarm, sry.



0 Kudos
Reply