- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As I mentioned in another post, I'm running the FFTW wrappers for MKL's DFT. I tested a 500x500x500 real-to-complex-even DFT by running both the native FFTW DFT and the MKL DFT (forward and inverse). In FFTW, I was forcing multi-threading on my 4-CPU node by invoking sfftw_init_threads() and sfftw_plan_with_nthreads(). In MKL, I noticed that by default, it seems to use 4 threads. This is not a complaint--that's a pretty smart code. But could I confirm that this is indeed the case?
FWIW, on this problem, the 4-thread FFTW result was about 2x slower than the 4-thread MKL result. Very impressive!
FWIW, on this problem, the 4-thread FFTW result was about 2x slower than the 4-thread MKL result. Very impressive!
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Section 6-8 of the user's guide just answered this question. This is cool, except if multiple threads increase peak memory usage, causing the system to bog down!
The value of MKL_DYNAMIC is by default set to TRUE, regardless of OMP_DYNAMIC, whose default value may be FALSE. MKL_DYNAMIC being TRUE means that Intel MKL will always try to pick what it considers the best number of threads, up to the maximum specified by the user.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I guess you may be using a fairly old MKL. In MKL 9 there were serial (non-threaded) libraries as well as the primary threaded ones. In MKL 10 you choose threaded by -lmkl_intel_thread, or non-threaded by -lmkl_sequential.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gerry,
Thanks for the link--the results are impressive.
So far, I've found the following for a large (500x500x500) DFT transform pair:
Run time:
FFTW 1 thread/FFTW 4 thread ~ 3.8x
FFTW 4 thread/MKL 4 thread ~ 2x
MKL 1 thread/MKL 4 thread ~ 2.5x
So for this 3D problem, FFTW multi-threaded scales very nicely. However, the factor of 2 between FFTW and MKL is incredibly tantalizing for me. If I could just lick this memory problem...
Thanks for the link--the results are impressive.
So far, I've found the following for a large (500x500x500) DFT transform pair:
Run time:
FFTW 1 thread/FFTW 4 thread ~ 3.8x
FFTW 4 thread/MKL 4 thread ~ 2x
MKL 1 thread/MKL 4 thread ~ 2.5x
So for this 3D problem, FFTW multi-threaded scales very nicely. However, the factor of 2 between FFTW and MKL is incredibly tantalizing for me. If I could just lick this memory problem...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tim,
Thanks for the response. I'm actually using MKL v10, but I am linking to -lmkl_intel_thread.
Thanks for the response. I'm actually using MKL v10, but I am linking to -lmkl_intel_thread.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page