<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi Dima, in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DFT-descriptor-generation-question/m-p/969845#M16501</link>
    <description>&lt;P&gt;Hi Dima,&lt;/P&gt;
&lt;P&gt;Thanks for your reply - I thought of that - but thought the performance of using for loop would be really bad. I just ran the code according to your guideline and the performance is way worse than 1024*64 number of size-16 FFT if assuming consecutive memory stride. Since the FLOPS are realtively small and I thought the batched execution may be able to exploit the memory and cache pretty good for stride(0, 64) as it is when stride (0, 1) is used.&lt;/P&gt;
&lt;P&gt;Do you have any suggestions to tune the performance?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks!!&lt;/P&gt;
&lt;P&gt;Jing&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Dmitry Baksheev (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi Jing,&lt;/P&gt;
&lt;P&gt;The following lines should guide you to the desired computation:&lt;/P&gt;
&lt;P&gt;MKL_LONG size = 16; MKL_LONG strides[] = { 0, 64 }; MKL_LONG ntransforms = 64; DftiCreateDescriptor(&amp;amp;h, ..., 1, size); // = I would like to compute size-16 FFT DftiSetValue(h, DFTI_INPUT_STRIDES, strides ); // = with stride 64 DftiSetValue(..., DFTI_NUMBER_OF_TRANSFORMS, ntransforms ); // compute 64 ffts of one row DftiCommitDescriptor(...); for (rowno=0;rowno&amp;lt;1024;++rowno) DftiComputeForward(h,&amp;amp;data[rowno*rowsize]);&lt;/P&gt;
&lt;P&gt;Thanks&lt;BR /&gt;Dima&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 07 Aug 2013 02:20:28 GMT</pubDate>
    <dc:creator>hello_world</dc:creator>
    <dc:date>2013-08-07T02:20:28Z</dc:date>
    <item>
      <title>MKL DFT descriptor generation question</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DFT-descriptor-generation-question/m-p/969842#M16498</link>
      <description>&lt;P&gt;Hi there,&lt;/P&gt;
&lt;P&gt;I have a question about the DFTI descriptor.&lt;/P&gt;
&lt;P&gt;So the problem is 1Kx1K complex number, row major. &amp;nbsp;for each row of 1K element, I would like to compute size-16 FFT with stride 64. That is - I do not want to compute size -1024 FFT but only size-16 FFT.&lt;/P&gt;
&lt;P&gt;For example: these 16- elements are element 0, 64, 128, 192, ... 1008. and another size-16 FFT elements are element 1, 65, 129, ... 1009, etc.&lt;/P&gt;
&lt;P&gt;And the same computation is applied on all the 1K rows.&lt;/P&gt;
&lt;P&gt;I had a look at the reference manual but am not sure if the descriptor could generate that.&lt;/P&gt;
&lt;P&gt;specifically, I don't know arguments like:&lt;/P&gt;
&lt;P&gt;1) num_of_transforms 2) stride, 3) dist.&lt;/P&gt;
&lt;P&gt;Thanks!&lt;/P&gt;
&lt;P&gt;Jing&lt;/P&gt;</description>
      <pubDate>Tue, 06 Aug 2013 17:05:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DFT-descriptor-generation-question/m-p/969842#M16498</guid>
      <dc:creator>hello_world</dc:creator>
      <dc:date>2013-08-06T17:05:52Z</dc:date>
    </item>
    <item>
      <title>Please take a look at MKL</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DFT-descriptor-generation-question/m-p/969843#M16499</link>
      <description>Please take a look at MKL examples for &lt;STRONG&gt;DftiComputeForward&lt;/STRONG&gt; and &lt;STRONG&gt;DftiComputeBackward&lt;/STRONG&gt; functions. Also, there is a thread related to some normalization issues of these functions and it is &lt;A href="http://software.intel.com/en-us/forums/topic/402439" target="_blank"&gt;http://software.intel.com/en-us/forums/topic/402439&lt;/A&gt;.</description>
      <pubDate>Wed, 07 Aug 2013 01:10:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DFT-descriptor-generation-question/m-p/969843#M16499</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-08-07T01:10:00Z</dc:date>
    </item>
    <item>
      <title>Hi Jing,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DFT-descriptor-generation-question/m-p/969844#M16500</link>
      <description>&lt;P&gt;Hi Jing,&lt;/P&gt;
&lt;P&gt;The following lines should guide you to the desired computation:&lt;/P&gt;
&lt;P&gt;[cpp]&lt;/P&gt;
&lt;P&gt;MKL_LONG size = 16;&lt;BR /&gt;MKL_LONG strides[] = { 0, 64 };&lt;BR /&gt;MKL_LONG ntransforms = 64;&lt;/P&gt;
&lt;P&gt;DftiCreateDescriptor(&amp;amp;h, ..., 1, size); // = I would like to compute size-16 FFT&lt;BR /&gt;DftiSetValue(h, DFTI_INPUT_STRIDES, strides ); // = with stride 64&lt;BR /&gt;DftiSetValue(..., DFTI_NUMBER_OF_TRANSFORMS, ntransforms ); // compute 64 ffts of one row&lt;BR /&gt;DftiCommitDescriptor(...);&lt;/P&gt;
&lt;P&gt;for (rowno=0;rowno&amp;lt;1024;++rowno) DftiComputeForward(h,&amp;amp;data[rowno*rowsize]);&lt;/P&gt;
&lt;P&gt;[/cpp]&lt;/P&gt;
&lt;P&gt;Thanks&lt;BR /&gt;Dima&lt;/P&gt;</description>
      <pubDate>Wed, 07 Aug 2013 01:52:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DFT-descriptor-generation-question/m-p/969844#M16500</guid>
      <dc:creator>Dmitry_B_Intel</dc:creator>
      <dc:date>2013-08-07T01:52:00Z</dc:date>
    </item>
    <item>
      <title>Hi Dima,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DFT-descriptor-generation-question/m-p/969845#M16501</link>
      <description>&lt;P&gt;Hi Dima,&lt;/P&gt;
&lt;P&gt;Thanks for your reply - I thought of that - but thought the performance of using for loop would be really bad. I just ran the code according to your guideline and the performance is way worse than 1024*64 number of size-16 FFT if assuming consecutive memory stride. Since the FLOPS are realtively small and I thought the batched execution may be able to exploit the memory and cache pretty good for stride(0, 64) as it is when stride (0, 1) is used.&lt;/P&gt;
&lt;P&gt;Do you have any suggestions to tune the performance?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks!!&lt;/P&gt;
&lt;P&gt;Jing&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Dmitry Baksheev (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi Jing,&lt;/P&gt;
&lt;P&gt;The following lines should guide you to the desired computation:&lt;/P&gt;
&lt;P&gt;MKL_LONG size = 16; MKL_LONG strides[] = { 0, 64 }; MKL_LONG ntransforms = 64; DftiCreateDescriptor(&amp;amp;h, ..., 1, size); // = I would like to compute size-16 FFT DftiSetValue(h, DFTI_INPUT_STRIDES, strides ); // = with stride 64 DftiSetValue(..., DFTI_NUMBER_OF_TRANSFORMS, ntransforms ); // compute 64 ffts of one row DftiCommitDescriptor(...); for (rowno=0;rowno&amp;lt;1024;++rowno) DftiComputeForward(h,&amp;amp;data[rowno*rowsize]);&lt;/P&gt;
&lt;P&gt;Thanks&lt;BR /&gt;Dima&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 07 Aug 2013 02:20:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-DFT-descriptor-generation-question/m-p/969845#M16501</guid>
      <dc:creator>hello_world</dc:creator>
      <dc:date>2013-08-07T02:20:28Z</dc:date>
    </item>
  </channel>
</rss>

