Re: about FFT

jg-kuk · ‎04-22-2006

Hi,

I'm implementing a filtering using FFT. ( but it's not in a conventional way)

Comparing the conventional method with my algorithm, I had a guestion about FFT in IPP.

According to my simulation results, N point complex FFT has the same pBuffer size asa 2N point real FFT. To be specific, two functions are ippsFFTFwd_CToC_32fc and ippsFFTFwd_RToPerm_32f. But in all lengths, N point complex FFT is faster than 2N point realFFT in spite of same pBuffer sizes.

I want to know the reason for that..

Message Edited by jg-kuk@ispl.snu.ac.kr on 04-22-200611:08 AM

Vladimir_Dudnik · ‎05-02-2006

Hello,

according our expert, amount of processed data in functions ippsFFTFwd_CToC_32fcand ippsFFTFwd_RToPerm is the same, but in the last function more calculations are needed. It is the reason why this function work slower for the same buffer size

Regards,
Vladimir

spiegalkuk · ‎05-03-2006

thanks.

Ithink there are more muls and adds in the 2N real FFT than N cplx FFT.

Are thereanother significant calculations make the N cplx FFT more faster than 2N real FFT except for mul and add?

and if it is allowed, could I know the methodforreal fft implementation in IPP? for instance, radix-2 butterfly structure, split-radix fft method and etc..

thank for your help.

Vladimir_Dudnik · ‎05-03-2006

Algorithm of 2N real FFT in ipps consist from to stages:
-calculation of N complex FFT (usually radix-4 used)
-calculation of recombination (like radix-2 iteration)

Vladimir

spiegalkuk · ‎05-12-2006

I'm sorry but, I can't understand that 2N point real FFT consist of N point cplx FFT..Doesn't it needtwo N point cplx FFTs?

and I can't follow that N point cplx FFT in IPPs is usually implemented by radix-4 algorithm..single radix-4 algorithm only operates when FFT length is powers of 4...

Am I missing something?..

Sorry to bother,last question..

Among 2N point real FFTand N point cplx FFT, which onedoesmakemore cashe miss under same condition(same cashe size)?

Thank you for your help.

Message Edited by SpiegalKuk on 05-12-200610:24 AM

Message Edited by SpiegalKuk on 05-12-200610:32 AM

Vladimir_Dudnik · ‎05-12-2006

Probably I was not clear enough, so will try to explain it again.

In case of N point complex FFT we use radix-4 for N=4^k and for N=2*4^k additional iteration of radix-2 required.

2N point real FFT implemented as N point complex FFT and recombination.

So, 2N point real FFT has more cache misses because it consist from two parts (every with own cache misses): N point complex FFT and recombination.

Regards,
Vladimir