Intel® Integrated Performance Primitives
Community support and discussions relating to developing high-performance vision, signal, security, and storage applications.
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.

efficient multiplale fft's



  I wan tto use fft to filter a large 2d image (~1000x1000). I want to filter the image several times each time with a different kernel (~21x21).

To benefit from the using the fft i can calculate the image fft once then only calculate each kernel's fft multiply the two and invFFT..

the problem is that the image is much larger than the kernel and zero padding the kernel to the image size looks to be very wasteful, is there a fft implementation that takes advantge of the zeros and computes the larger fft but withou the external zero padding?

Thnak you in advance,


0 Kudos
3 Replies

About zero padding in time domain, if src width = 1000 and kernel width = 21, then maximum required zero padding (to reduce ringing) would be 1000+21-1 = 1020.
If you just zero pad to 1024x1024 then you benefit from FFT (faster at powers of two, and requires powers of two).

Using DFT: faster at powers of two, and does not require powers of two.

However, if your ~1000 actually is 1024, then 1024+21-1 = 1044, and this is not efficient for FFT.
I consider IPP's DFT to be very fast, so you could just zero pad to 1044, then DFT this, then loop:
- Make kernel (in time domain + FFT/DFT to 1044)
- Muliply
- DFT-1

You must also consider not thinking about a "kernel", at since this is a time domain thing.
That said, IPP actually implements (time domain) convolution using DFT, if image dimension is > some value. Your 1000 is above that value...

So, you should make a test only using time domain convolutions to see if that is fast enough. Make sure to allow ippiConvolve to use multiple threads.


If you would leave some feedback here, we'd all appreciate it...


thank you for your response.

What I meant was that it seems redundant to zero pad the small kernel (or patch\template whichever you prefer to call it) and do a DFT on the padded kernel since it will involve a lot of multiplications by zero.. I wanterd to know if the ipp has some DFT function that knows to use the fact that I want to do a 1024 sized FFT on a actual 21 sized image.