FFT in-place vs out-of-place performance

piet_de_weer · ‎05-11-2011

In my code I have a lot of FFT and inverse FFT (ippsFFTInv_CCSToR_32f) calls.

In a specific instance, I was doing the following:

Copy buffer A to buffer B with some changes
Reverse FFT from buffer B to buffer C (ippsFFTInv_CCSToR_32f, out of place)

Since this contains an unnecessary intermediate buffer (B), I thought I could improve the performance by removing that buffer:

Copy buffer A to buffer C with some changes
Reverse FFT in buffer C (ippsFFTInv_CCSToR_32f_I, in place)

To my surprise, I'm unable to measure any difference in performance between the two. That seems odd, unless the in-place FFT is either slower, or internally uses an extra buffer.

Does this mean that switching from out-of-place to in-place processing does NOT improve the performance? That might be interesting to know for certain processing steps...

levicki · ‎05-12-2011

You can check the memory allocation size of your application for in-place .vs. out of place processing -- if there is no difference then in-place is using an additional buffer.

Ying_H_Intel · ‎05-31-2011

Hello Piet, What is your FFT order? General speaking, for small FFT orders ( for example, float complex FFT < ~19 depends on platform (cache size)), there is no difference between in-place and out-of-place cases performance. Because for such orders, the memory buffer used by FFT function is small (~equal to vector length) and can be in-cache, FFT is calculated in the buffer and then result is copied to the destination so for in-cache cases it doesnt matter where to copy to src or to dst vector. For rather large orders (>19) in-place version is faster because internally FFT uses buffer of smaller size (less than input vector length). Related discussion is also in <<>> Best Regards, Ying