Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Best way to parallelize DFT calls?

OP1
New Contributor III
557 Views

Assume that I have an array X(4096,256). Each column of X contains a time series on which I need to perform a DFT.
My question is: is it better to do a loop (1 to 256) in which each time series (column) is processed by one call to the DFT subroutine (which will perform the calculations in parallel for that one series); or is is better to parallelize the loop to havemultiple threads, each addressing one DFT at a time?
Whatwould be the most efficient?

Thanks,

Olivier

0 Kudos
3 Replies
Dmitry_B_Intel
Employee
557 Views
Olivier,

Probably the best way is to use DFTI_NUMBER_OF_TRANSFORMS configuration parameter and do all the transforms by a single call to a DftiCompute function.

Thanks
Dima

0 Kudos
OP1
New Contributor III
557 Views
Thanks Dima,

Yes, this will work efficiently when I perform the FFTs on a continuous group of columns (say, columns 1 to 256, or 53 to 125 etc). In many cases however my subroutine receives a list of column numbers on which the DFTs need to be performed (like columns 1-34-56-87...110). This list is not necessarily made of continuous integer intervals.

I am thinking that maybe the best way to implement this is to automatically test (at the beginning of my code) which solution works faster - since I suppose all this will depend on the number of processors, memory available etc. Kind of a self-tuning approach, so to speak.

Olivier

0 Kudos
Dmitry_B_Intel
Employee
557 Views
Olivier,
Then returning to your original questionwhether it is better to parallelize the loop the (obvious) answer is yes. This is actually what DFT does when it knows it have to do multiple transforms in a row. When the multiple transforms are not uniformly spaced (1st, 38th etc) then the following technique might be useful. You must know in advance how many threads at maximum you can devote to compute the DFTs, let this number be MAXNT. Then you create one descriptor, and set DFTI_NUMBER_OF_USER_THREADS to MAXNT. After you commit the descriptor, you can use it in parallel by the DftiCompute functions each doing its own (1st, 38th etc) transform. Just to be clear let me outline the code:
DftiCreateDescriptor(hand,...)
MAXNT = omp_get_max_threads()
DftiSetValue(hand,DFTI_NUMBER_OF_USER_THREADS,MAXNT)
...
DftiCommitDescriptor(hand)
...
$omp parallel do num_threads(MAXNT)
do i = ...
DftiComputeForward(hand, x(:,list(i)) )
enddo

Thanks
Dima
0 Kudos
Reply