- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
I have a question about the DFTI descriptor.
So the problem is 1Kx1K complex number, row major. for each row of 1K element, I would like to compute size-16 FFT with stride 64. That is - I do not want to compute size -1024 FFT but only size-16 FFT.
For example: these 16- elements are element 0, 64, 128, 192, ... 1008. and another size-16 FFT elements are element 1, 65, 129, ... 1009, etc.
And the same computation is applied on all the 1K rows.
I had a look at the reference manual but am not sure if the descriptor could generate that.
specifically, I don't know arguments like:
1) num_of_transforms 2) stride, 3) dist.
Thanks!
Jing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jing,
The following lines should guide you to the desired computation:
[cpp]
MKL_LONG size = 16;
MKL_LONG strides[] = { 0, 64 };
MKL_LONG ntransforms = 64;
DftiCreateDescriptor(&h, ..., 1, size); // = I would like to compute size-16 FFT
DftiSetValue(h, DFTI_INPUT_STRIDES, strides ); // = with stride 64
DftiSetValue(..., DFTI_NUMBER_OF_TRANSFORMS, ntransforms ); // compute 64 ffts of one row
DftiCommitDescriptor(...);
for (rowno=0;rowno<1024;++rowno) DftiComputeForward(h,&data[rowno*rowsize]);
[/cpp]
Thanks
Dima
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dima,
Thanks for your reply - I thought of that - but thought the performance of using for loop would be really bad. I just ran the code according to your guideline and the performance is way worse than 1024*64 number of size-16 FFT if assuming consecutive memory stride. Since the FLOPS are realtively small and I thought the batched execution may be able to exploit the memory and cache pretty good for stride(0, 64) as it is when stride (0, 1) is used.
Do you have any suggestions to tune the performance?
Thanks!!
Jing
Dmitry Baksheev (Intel) wrote:
Hi Jing,
The following lines should guide you to the desired computation:
MKL_LONG size = 16; MKL_LONG strides[] = { 0, 64 }; MKL_LONG ntransforms = 64; DftiCreateDescriptor(&h, ..., 1, size); // = I would like to compute size-16 FFT DftiSetValue(h, DFTI_INPUT_STRIDES, strides ); // = with stride 64 DftiSetValue(..., DFTI_NUMBER_OF_TRANSFORMS, ntransforms ); // compute 64 ffts of one row DftiCommitDescriptor(...); for (rowno=0;rowno<1024;++rowno) DftiComputeForward(h,&data[rowno*rowsize]);
Thanks
Dima
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page