Hi Zhu,
By default real-to-complex FFTs store the transformed data in the CCS format (see the default value for DFTI_PACKED_FORMATp.2999 MKLReference Manual).
This format is described in detail for 2D FFTs onp. 3004.
Notice that, for a 2D FFT of MxN real numbers, the transformed data consist of (M+2)x(N+2) real numbers.
Furthermore, these transformed data are stored in a not so straightforward way :-)
We need to allocatemore memory (not 12 but 20 floats in the very first example in this thread.)
By default all FFTs are in-place and only input strides have a default value.
If we ask for a not-in-place FFT, we must set the output strides ourselves.
See example dftc/real_2d_ccs_single_ex2.c for an example that is closest to your use case.
For the first example in this thread, the output strides must be {0, 5, 1}.
Evgueni.