- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Assume that I have an array X(4096,256). Each column of X contains a time series on which I need to perform a DFT.
My question is: is it better to do a loop (1 to 256) in which each time series (column) is processed by one call to the DFT subroutine (which will perform the calculations in parallel for that one series); or is is better to parallelize the loop to havemultiple threads, each addressing one DFT at a time?
Whatwould be the most efficient?
Thanks,
Olivier
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Olivier,
Probably the best way is to use DFTI_NUMBER_OF_TRANSFORMS configuration parameter and do all the transforms by a single call to a DftiCompute function.
Thanks
Dima
Probably the best way is to use DFTI_NUMBER_OF_TRANSFORMS configuration parameter and do all the transforms by a single call to a DftiCompute function.
Thanks
Dima
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Dima,
Yes, this will work efficiently when I perform the FFTs on a continuous group of columns (say, columns 1 to 256, or 53 to 125 etc). In many cases however my subroutine receives a list of column numbers on which the DFTs need to be performed (like columns 1-34-56-87...110). This list is not necessarily made of continuous integer intervals.
I am thinking that maybe the best way to implement this is to automatically test (at the beginning of my code) which solution works faster - since I suppose all this will depend on the number of processors, memory available etc. Kind of a self-tuning approach, so to speak.
Olivier
Yes, this will work efficiently when I perform the FFTs on a continuous group of columns (say, columns 1 to 256, or 53 to 125 etc). In many cases however my subroutine receives a list of column numbers on which the DFTs need to be performed (like columns 1-34-56-87...110). This list is not necessarily made of continuous integer intervals.
I am thinking that maybe the best way to implement this is to automatically test (at the beginning of my code) which solution works faster - since I suppose all this will depend on the number of processors, memory available etc. Kind of a self-tuning approach, so to speak.
Olivier
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Olivier,
Then returning to your original questionwhether it is better to parallelize the loop the (obvious) answer is yes. This is actually what DFT does when it knows it have to do multiple transforms in a row. When the multiple transforms are not uniformly spaced (1st, 38th etc) then the following technique might be useful. You must know in advance how many threads at maximum you can devote to compute the DFTs, let this number be MAXNT. Then you create one descriptor, and set DFTI_NUMBER_OF_USER_THREADS to MAXNT. After you commit the descriptor, you can use it in parallel by the DftiCompute functions each doing its own (1st, 38th etc) transform. Just to be clear let me outline the code:
DftiCreateDescriptor(hand,...)
MAXNT = omp_get_max_threads()
DftiSetValue(hand,DFTI_NUMBER_OF_USER_THREADS,MAXNT)
...
DftiCommitDescriptor(hand)
...
$omp parallel do num_threads(MAXNT)
do i = ...
DftiComputeForward(hand, x(:,list(i)) )
enddo
Thanks
Dima

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page