Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

about FFT

jg-kuk
Principiante
891 Vistas
Hi,
I'm implementing a filtering using FFT. ( but it's not in a conventional way)
Comparing the conventional method with my algorithm, I had a guestion about FFT in IPP.
According to my simulation results, N point complex FFT has the same pBuffer size asa 2N point real FFT. To be specific, two functions are ippsFFTFwd_CToC_32fc and ippsFFTFwd_RToPerm_32f. But in all lengths, N point complex FFT is faster than 2N point realFFT in spite of same pBuffer sizes.
I want to know the reason for that..

Message Edited by jg-kuk@ispl.snu.ac.kr on 04-22-200611:08 AM

0 kudos
5 Respuestas
Vladimir_Dudnik
Empleados
891 Vistas

Hello,

according our expert, amount of processed data in functions ippsFFTFwd_CToC_32fcand ippsFFTFwd_RToPerm is the same, but in the last function more calculations are needed. It is the reason why this function work slower for the same buffer size

Regards,
Vladimir

spiegalkuk
Principiante
891 Vistas
thanks.
Ithink there are more muls and adds in the 2N real FFT than N cplx FFT.
Are thereanother significant calculations make the N cplx FFT more faster than 2N real FFT except for mul and add?
and if it is allowed, could I know the methodforreal fft implementation in IPP? for instance, radix-2 butterfly structure, split-radix fft method and etc..
thank for your help.
Vladimir_Dudnik
Empleados
891 Vistas

Algorithm of 2N real FFT in ipps consist from to stages:
-calculation of N complex FFT (usually radix-4 used)
-calculation of recombination (like radix-2 iteration)

Vladimir

spiegalkuk
Principiante
891 Vistas
I'm sorry but, I can't understand that 2N point real FFT consist of N point cplx FFT..Doesn't it needtwo N point cplx FFTs?
and I can't follow that N point cplx FFT in IPPs is usually implemented by radix-4 algorithm..single radix-4 algorithm only operates when FFT length is powers of 4...
Am I missing something?..
Sorry to bother,last question..
Among 2N point real FFTand N point cplx FFT, which onedoesmakemore cashe miss under same condition(same cashe size)?
Thank you for your help.

Message Edited by SpiegalKuk on 05-12-200610:24 AM

Message Edited by SpiegalKuk on 05-12-200610:32 AM

Message Edited by SpiegalKuk on 05-12-200610:32 AM

Vladimir_Dudnik
Empleados
891 Vistas

Probably I was not clear enough, so will try to explain it again.

In case of N point complex FFT we use radix-4 for N=4^k and for N=2*4^k additional iteration of radix-2 required.

2N point real FFT implemented as N point complex FFT and recombination.

So, 2N point real FFT has more cache misses because it consist from two parts (every with own cache misses): N point complex FFT and recombination.

Regards,
Vladimir

Responder