- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a program that runs a 3D in-place DFT (real to complex even) on a large test vector (750x750x750). The code orignally called FFTW directly, but I successfully linked the MKL libraries and can call them with the FFTW wrappers.
Initial results on a machine with 4GB of memory was very successful--the MKL DFT ran about 2x faster on 4 threads than the FFTW DFT.
However, when I ran the same jobs on a machine with 2GB of memory and ran "top", I noticed a disturbing thing: the MKL job wanted to use 3.6GB of memory, whereas the FFTW job only wanted 1.8GB. In other words, it seems as if the MKL wrapper is somehow making an internal copy of the vector.
Am I correct? If so, what if anything is the remedy? I can't afford to have two copies of the vector for my apps.
Thanks...
Initial results on a machine with 4GB of memory was very successful--the MKL DFT ran about 2x faster on 4 threads than the FFTW DFT.
However, when I ran the same jobs on a machine with 2GB of memory and ran "top", I noticed a disturbing thing: the MKL job wanted to use 3.6GB of memory, whereas the FFTW job only wanted 1.8GB. In other words, it seems as if the MKL wrapper is somehow making an internal copy of the vector.
Am I correct? If so, what if anything is the remedy? I can't afford to have two copies of the vector for my apps.
Thanks...
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The plot seems to thicken... (to me, at least)
If I run the MKL DFT (750x750x750) through the FFTW wrapper (sfftw r2c/c2r) with 4 thread, the process takes double the memory that it should (3.4GB). When I run on a single thread, it uses the expected amount of memory (1.7GB). Normal FFTW uses 1.7GB regardless of how many threads are used.
This is weird. There must be a special variable buried somewhere in the docs that can keep this from happening.
If I run the MKL DFT (750x750x750) through the FFTW wrapper (sfftw r2c/c2r) with 4 thread, the process takes double the memory that it should (3.4GB). When I run on a single thread, it uses the expected amount of memory (1.7GB). Normal FFTW uses 1.7GB regardless of how many threads are used.
This is weird. There must be a special variable buried somewhere in the docs that can keep this from happening.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK, now I believe I have discovered the problem: you guys have not implemented the 3D, in-place, real-to-complex-even DFTI. Argh! I tried just calling the native Dfti... routines and get a DFTI_UNIMPLEMENTED error from DftiCommitDescriptor(). Moreover, there is a conspicuous lack of the desired example in the examples directory. When you call in-place FFTW wrapper, it must be calling the out-of-place DFTI routine.
Any thoughts on when you might get around to implementing the 3D, in-place, real-to-complex-even DFTI?
Any thoughts on when you might get around to implementing the 3D, in-place, real-to-complex-even DFTI?

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page