Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

MKL DFT descriptor generation question

hello_world
Beginner
895 Views

Hi there,

I have a question about the DFTI descriptor.

So the problem is 1Kx1K complex number, row major.  for each row of 1K element, I would like to compute size-16 FFT with stride 64. That is - I do not want to compute size -1024 FFT but only size-16 FFT.

For example: these 16- elements are element 0, 64, 128, 192, ... 1008. and another size-16 FFT elements are element 1, 65, 129, ... 1009, etc.

And the same computation is applied on all the 1K rows.

I had a look at the reference manual but am not sure if the descriptor could generate that.

specifically, I don't know arguments like:

1) num_of_transforms 2) stride, 3) dist.

Thanks!

Jing

0 Kudos
3 Replies
SergeyKostrov
Valued Contributor II
895 Views
Please take a look at MKL examples for DftiComputeForward and DftiComputeBackward functions. Also, there is a thread related to some normalization issues of these functions and it is http://software.intel.com/en-us/forums/topic/402439.
0 Kudos
Dmitry_B_Intel
Employee
895 Views

Hi Jing,

The following lines should guide you to the desired computation:

[cpp]

MKL_LONG size = 16;
MKL_LONG strides[] = { 0, 64 };
MKL_LONG ntransforms = 64;

DftiCreateDescriptor(&h, ..., 1, size); // = I would like to compute size-16 FFT
DftiSetValue(h, DFTI_INPUT_STRIDES, strides ); // = with stride 64
DftiSetValue(..., DFTI_NUMBER_OF_TRANSFORMS, ntransforms ); // compute 64 ffts of one row
DftiCommitDescriptor(...);

for (rowno=0;rowno<1024;++rowno) DftiComputeForward(h,&data[rowno*rowsize]);

[/cpp]

Thanks
Dima

0 Kudos
hello_world
Beginner
895 Views

Hi Dima,

Thanks for your reply - I thought of that - but thought the performance of using for loop would be really bad. I just ran the code according to your guideline and the performance is way worse than 1024*64 number of size-16 FFT if assuming consecutive memory stride. Since the FLOPS are realtively small and I thought the batched execution may be able to exploit the memory and cache pretty good for stride(0, 64) as it is when stride (0, 1) is used.

Do you have any suggestions to tune the performance? 

Thanks!!

Jing

Dmitry Baksheev (Intel) wrote:

Hi Jing,

The following lines should guide you to the desired computation:

MKL_LONG size = 16; MKL_LONG strides[] = { 0, 64 }; MKL_LONG ntransforms = 64; DftiCreateDescriptor(&h, ..., 1, size); // = I would like to compute size-16 FFT DftiSetValue(h, DFTI_INPUT_STRIDES, strides ); // = with stride 64 DftiSetValue(..., DFTI_NUMBER_OF_TRANSFORMS, ntransforms ); // compute 64 ffts of one row DftiCommitDescriptor(...); for (rowno=0;rowno<1024;++rowno) DftiComputeForward(h,&data[rowno*rowsize]);

Thanks
Dima

0 Kudos
Reply