- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am a newbie to MKL and am trying out the 10.0.011 FFT routines with gcc as my compiler. My PC is a Intel Core 2 PC, and indeed MKL detects that max threads can be 2. My test code is not threaded.
I've run FFTs ranging from 8192 points to 262144 points. When the batch size is 1, and I use mkl_set_num_threads to change the possible thread number, I do not see any performance change. I've tried 1,2 and 4 thread settings.
If I change the batch size to 2,4,8 and 16, I see better performance for the setting of 2 threads. I am not surprised by this as there are only 2 cores on my PC. However, if I monitor the CPU performance using gnome-system-monitor, I only see one core at a time being used at or close to 100%. The other CPU core very occassionally has high usage.
First, can someone tell me whether a batch size of 1 should also experience some mkl threading? From the manual I assumed that the only time that a 1D FFT of batch size 1 would not thread is if its size is not a power of 2.
Second, can you tell me why one of my CPU cores is barely being utilized? Do I need to link to the omp libs to get both cores going with mkl? I do not set any thread related env vars as I call mkl_set_num_threads directly from my code.
Thank you,
skb
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
One-dimensionalFFT with DFTI_NUMBER_OF_TRANSFORMS set to1will run on the two cores of your system unless it is non-2-power or single-precision or in-place transform or stride is non unit. Of course, you should have linked the application to openmp library (-lguide). Greater performance benefit fromrunning on two cores will show up with 2D and higher-dimensional problems.
Thanks
Dima
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks,
Bonnie
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am using MKL (version 8.1) on Macosx and trying a simple FFT example using multi-threading. My mac has 2 quad-core processors. I had couple of questions:
(1)To use multi-threading for the FFT routine (DFTComputeForward) only, do I need to compile with -openmp ?
(2) If I define the env variable omp_num_threads=8, do I still need to set DFT_NUM_USER_THREADS?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Bonnie,
Utilizing other CPUs can be done by doing non-1D transforms, by doing transforms in bunches (see DFTI_NUMBER_OF_TRANSFORMS configuration parameter), or by doing threading in a coarser way, at application level. In the latter case refer to DFTI_NUMBER_OF_USER_THREADS configuration parameter. Moving to double precision will also utilize other CPUs on 1D out-of-place 2-power transforms.
Thanks,
Dima
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
amath,
Setting environment variable OMP_NUM_THREADS=8 should be enough formany transforms be done in parallel when you call DftiComputeForward. Configuration parameter DFTI_NUMBER_OF_USER_THREADSrefers to adifferent way of parrallelization: you will need it if you parallelize your application and want several threads of the application share the same descriptor in the calls to DftiComputeForward.
Thanks,
Dima
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks,
Bonnie
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
-Todd
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
REAL_1D_CSS_DOUBLE_EX1.OUT
REAL_1D_CSS_DOUBLE_EX2.OUT
I tried setting the environment variables MKL_NUM_THREADS, MKL_DOMAIN_NUM_THREADS, and OMP_NUM_THREADS to 16 and there was no change. I also toggled the MKL_DYNAMIC and OMP_DYNAMIC variables from 0 to 1.
I put some quick pthreads threading in my code, and it correctly spread across CPUs without setting or changing any environment variable.
Is there some system installation parameter or account parameter that is set wrong?
Still baffled,
Bonnie
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So there is no way I can run the 1D FFT (single-precision) on two cores? Even mkl_set_num_threads ( 2 ) won't guarantee that the DFT will run on two cores ?
Thanks
Kavi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Kavi,
The short answer yes, there is no way to do that right now.
As Todd recommended you, Could you please submit a simple test case at premier support.
Really this is the best and fastest way to resolve this kind of issues.
Gennady

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page