I'm trying to compute in-place FFT of 3-dim arrays on clusters. As far as I have tried using the MKL FFTW3 wrapper, a buffer memory of the same amount as the original array seems to be allocated on creating a FFTW plan. Due to the limitation of available memory, I would like to reduce the size of memory buffer.
Is there any way to control the size of buffer memory using the MKL FFTW3 wrapper? If this is not possible with the FFTW3 wrapper, I would also like to know if this is possible at the level of the MKL FFT routines.
Original FFTW3 MPI does honest in-place FFT transform if out-pointer is the same as in-pointer -- i.e. no extra memory is used. MKL FFTW3 MPI wrappers do not do that by default for performance reasons -- they indeed allocate addition buffer and use out-of-place transform.
If you use Intel MKL 11.3 or higher you can redefine WANT_FAST_INPLACE_CLUSTER_FFT macro to 0 (by default it is set to 1) and recompile MKL FFTW3 MPI wrappers. This macro says wrappers to not use extra workspace and do honest in-place transform with a price of performance. You can find this macro in $MKLROOT/include/fftw/fftw3-mpi_mkl.h (line ~53).
DFTI interface also allows not using extra workspace -- just configure DFTI DM descriptor with placement set to DFTI_INPLACE.