Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

FFT in-place vs out-of-place performance

piet_de_weer
Beginner
1,736 Views
In my code I have a lot of FFT and inverse FFT (ippsFFTInv_CCSToR_32f) calls.

In a specific instance, I was doing the following:

Copy buffer A to buffer B with some changes
Reverse FFT from buffer B to buffer C (ippsFFTInv_CCSToR_32f, out of place)

Since this contains an unnecessary intermediate buffer (B), I thought I could improve the performance by removing that buffer:

Copy buffer A to buffer C with some changes
Reverse FFT in buffer C (ippsFFTInv_CCSToR_32f_I, in place)

To my surprise, I'm unable to measure any difference in performance between the two. That seems odd, unless the in-place FFT is either slower, or internally uses an extra buffer.

Does this mean that switching from out-of-place to in-place processing does NOT improve the performance? That might be interesting to know for certain processing steps...
0 Kudos
2 Replies
levicki
Valued Contributor I
1,736 Views
You can check the memory allocation size of your application for in-place .vs. out of place processing -- if there is no difference then in-place is using an additional buffer.
0 Kudos
Ying_H_Intel
Employee
1,736 Views
Hello Piet, What is your FFT order? General speaking, for small FFT orders ( for example, float complex FFT < ~19 depends on platform (cache size)), there is no difference between in-place and out-of-place cases performance. Because for such orders, the memory buffer used by FFT function is small (~equal to vector length) and can be in-cache, FFT is calculated in the buffer and then result is copied to the destination so for in-cache cases it doesnt matter where to copy to src or to dst vector. For rather large orders (>19) in-place version is faster because internally FFT uses buffer of smaller size (less than input vector length). Related discussion is also in <<>> Best Regards, Ying
0 Kudos
Reply