No 3D in-place real-to-complex-even MKL DFT?

mpbro · ‎12-11-2007

I have a program that runs a 3D in-place DFT (real to complex even) on a large test vector (750x750x750). The code orignally called FFTW directly, but I successfully linked the MKL libraries and can call them with the FFTW wrappers.

Initial results on a machine with 4GB of memory was very successful--the MKL DFT ran about 2x faster on 4 threads than the FFTW DFT.

However, when I ran the same jobs on a machine with 2GB of memory and ran "top", I noticed a disturbing thing: the MKL job wanted to use 3.6GB of memory, whereas the FFTW job only wanted 1.8GB. In other words, it seems as if the MKL wrapper is somehow making an internal copy of the vector.

Am I correct? If so, what if anything is the remedy? I can't afford to have two copies of the vector for my apps.

Thanks...

mpbro · ‎12-11-2007

The plot seems to thicken... (to me, at least)

If I run the MKL DFT (750x750x750) through the FFTW wrapper (sfftw r2c/c2r) with 4 thread, the process takes double the memory that it should (3.4GB). When I run on a single thread, it uses the expected amount of memory (1.7GB). Normal FFTW uses 1.7GB regardless of how many threads are used.

This is weird. There must be a special variable buried somewhere in the docs that can keep this from happening.

mpbro · ‎12-12-2007

OK, now I believe I have discovered the problem: you guys have not implemented the 3D, in-place, real-to-complex-even DFTI. Argh! I tried just calling the native Dfti... routines and get a DFTI_UNIMPLEMENTED error from DftiCommitDescriptor(). Moreover, there is a conspicuous lack of the desired example in the examples directory. When you call in-place FFTW wrapper, it must be calling the out-of-place DFTI routine.

Any thoughts on when you might get around to implementing the 3D, in-place, real-to-complex-even DFTI?