Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

DFTI - number of user threads.

t_clark
Beginner
550 Views
Hi all,

I'm trying to use DFTs in a threaded application. Many executions (10000+) each with the same size transform. So I'm using a common descriptor to prevent committing 10,000 times.

The use of a common descriptor is described by Intel#'s article here (case #4)...
http://software.intel.com/en-us/articles/different-parallelization-techniques-and-intel-mkl-fft/

...but I have a question based on the number of threads allowed.

The code (case #4) on the example page is either trivial or lazy - the actual number of OMP threads is the same as the maximum number of OMP threads, which in turn is the same as the number of FFTs executed.

So, my FFTs are contained in an OMP parallel DO loop which executes its contents 10,000 times on (say) an 8 core machine.

Maybe I want to reserve a core or two for other functions. My OMP_MAX_NUM_THREADS environment variable will be 8, my OMP_NUM_THREADS will be 6.

The question is: "what value should DFTI_NUMBER_OF_USER_THREADS parameter take?"

Does it have to be 10,000 (one for each DFT execution), or does it have to be 6 (one for each simultaneously running thread).

Alternatively, will it still work if I set it to 8 (the maximum physically allowed) whilst the actual number of threads which will execute is 6?

Thanks for any insight you can give!

Kind regards

Tom Clark
0 Kudos
6 Replies
VipinKumar_E_Intel
550 Views
Did you have a chance to look a the the MKL reference manual (which has more recent update, we will be updating the article soon as well) on DFT threading?

http://software.intel.com/sites/products/documentation/hpc/mkl/updates/10.3.5/mklman/appendices/mkl_appC_DFTMT.htm#appC-exC-22

http://software.intel.com/sites/products/documentation/hpc/mkl/updates/10.3.5/mklman/fft/fft_NumberOfThreads.htm

--Vipin

0 Kudos
t_clark
Beginner
550 Views
Hi Vipin, thanks for responding.

Yes, I took a close look at both of those sources before posting.

In the first one, the number of OMP threads (integer nth in that code) is the same as the number of executions of the FFT within the DO loop ( Do ith = 1,nth). So it can't answer my question.

The second link is mostly parsed from the article I cited earlier (or vice-versa ;) and is ambiguous about whether DFTI_NUMBER_OF_USER_THREADS must be set to the number of threads which can be executed simultaneously, or the total number of threads spawned during a parallel region.

Giving a more basic example... is the following pseudocode flawed?...

Thanks, and kind regards

Tom

nWorkers = omp_get_max_threads() ! =8 for my dual quad core system
nFFTs = 10000 ! typically - not always exactly

[... create descriptors ...]
status = DftiSetValue (descriptorHandle DFTI_NUMBER_OF_USER_THREADS, nWorkers)
[... commit descriptors ...]

!$OMP PARALLEL DO SHARED(descriptorHandle, nFFTs, nWorkers) PRIVATE(someData)
DO fftCtr = 1,nFFTs ! <------- NOTE --- DIFFERENT TO nWorkers

call getSomeData(someData,fftCtr)

call fftTheData(descriptorHandle,someData)

ENDDO
!$OMP END PARALLEL DO


SUBROUTINE fftTheData(descriptorHandle, someData)
[... declarations]
status = DftiComputeForward (descriptorHandle, someData)
[... do stuff with the data and return]
END SUBROUTINE fftTheData

*Edit corrected a bug in the psedocode!!!!
0 Kudos
barragan_villanueva_
Valued Contributor I
550 Views
Hi,

To limit number of threads for FFT domain please use MKL service function
mkl_domain_set_num_threads(, MKL_FFT)
or set env accordinally
MKL_DOMAIN_NUM_THREADS=MKL_FFT=
See MKL doc for details
0 Kudos
t_clark
Beginner
550 Views
Hi,

Thanks again but that still isn't my point - the purpose of that is for setting the number of threads that the MKL libraries use internally (i.e. for each FFT to do, how many threads are used to compute it).

In my case, I'd set it to 1, but I'm linking against the sequential library anyway - so each FFT forced to stay within it's own single thread.

Lacking documentation, I've just been trying things out. For anyone else trying to answer this question, I think the answer is to set DFTI_NUMBER_OF_USER_THREADS to the same value as omp_get_max_threads().

I figure that the descriptors contain data reserved so that at an instant in time, any thread which is running has access to a private area of data. Thus I don't need to set DFTI_NUMBER OF_USER_THREADS to 10000 (the total number of threads which will execute), but only to 8 (the max number of threads which can execute simultaneously).

However, I'm still really unsure on this - because I don't know what happens at the end (e.g. if I execute 9 FFTs with DFTI_NUMBER_OF_USER_THREADS set to 8, will the 9th one work reliably?)

If anyone knows the answer to this, I'd really appreciate confirmation - at the moment I'm just hoping for the best.

Cheers,

Tom
0 Kudos
Evgueni_P_Intel
Employee
550 Views
Hi t_clark,

If we go back to the original question "what value should DFTI_NUMBER_OF_USER_THREADS parameter take?", DFTI_NUMBER_OF_USER_THREADS should be set to the number of the OMP threads that your application uses to parallelize the OMP DO loop.

Another possibility for you would be to limit the number of threads for MKL FFTs as Victor suggests above, and do so-called multiple FFTs -- set DFTI_NUMBER_OF_TRANSFORMS. MKL will do parallelization.

Regarding your last question, yes,MKL guarantees correctness of the result if you do 9 FFTs in 8 threads.

E.
0 Kudos
t_clark
Beginner
550 Views
Evgueni,

Thanks, that's answered my question completely. In my case, using OMP rather than setting the number of transforms > 1 is the best bet as it's not just the FFT that I'm parallelising - there's other work within the loop.

Now confident that my code is valid.

Thanks all, and kind regards

Tom
0 Kudos
Reply