<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Major slow down for inverse 2D FFT in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Major-slow-down-for-inverse-2D-FFT/m-p/791762#M2272</link>
    <description>&lt;P&gt;That was the cause.&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Now I just need to figure out how the may impact the rest of the calculations in my system.&lt;BR /&gt;&lt;BR /&gt;Jim K.&lt;/P&gt;</description>
    <pubDate>Wed, 14 Mar 2012 18:31:39 GMT</pubDate>
    <dc:creator>James_K_2</dc:creator>
    <dc:date>2012-03-14T18:31:39Z</dc:date>
    <item>
      <title>Major slow down for inverse 2D FFT</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Major-slow-down-for-inverse-2D-FFT/m-p/791758#M2268</link>
      <description>&lt;P&gt;I have been tracking down the casue of a major slow down releated to FFT and filtering using the MKL libaries&lt;BR /&gt;&lt;BR /&gt;During filtering of a large 2D set of data (6427x6427) using floats (not doubles).&lt;BR /&gt;&lt;BR /&gt;What I am seeing:&lt;/P&gt;&lt;P&gt; Forward FFT the data takes about 1.3 sec.&lt;/P&gt;&lt;P&gt; Multiply the output coefficient of the FFT by a weighting function&lt;/P&gt;&lt;P&gt; Inverse FFT the data takes about 6 sec (FFT only, not the time to apply weights). &lt;/P&gt;&lt;P&gt;Taking 6 seconds on the inverse FFT makes no sense, so during testing I tried the following: (This is just what I think is relevant results and not all testing I did)&lt;BR /&gt;&lt;BR /&gt;After many experiments I get.&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Not applying the weighting of FFT output coefficients:&lt;OL&gt;&lt;LI&gt;Forward FFT the data takes about 1.3 sec.&lt;/LI&gt;&lt;LI&gt;Inverse FFT the data takes about 1.3 sec. This is what I would expect.&lt;/LI&gt;&lt;/OL&gt;&lt;/LI&gt;&lt;LI&gt;Going through motions of applying the weights, just not applying weights. (i.e. everything except finial multiply)&lt;OL&gt;&lt;LI&gt;Forward FFT the data takes about 1.3 sec.&lt;/LI&gt;&lt;LI&gt;Inverse FFT the data takes about 1.3 sec. This is what I would expect.&lt;/LI&gt;&lt;/OL&gt;&lt;/LI&gt;&lt;LI&gt;Adding two lines to #2 above which multiply the FFT coefficients by the weighting function&lt;OL&gt;&lt;LI&gt;Forward FFT the data takes about 1.3 sec.&lt;/LI&gt;&lt;LI&gt;Inverse FFT the data takes about 6 sec. Why would the simple act of multiply the coefficients by the weights cause the time to change from 1.3 sec to 6 sec.&lt;/LI&gt;&lt;/OL&gt;&lt;/LI&gt;&lt;LI&gt;When using&lt;EM&gt;&lt;SPAN style="text-decoration: underline;"&gt; the same code &lt;/SPAN&gt;&lt;/EM&gt;on a much smaller (1024x1024) set of data the forward and inverse FFT, with weight multiplication, are the same.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Data calls used: DftiComputeForward and DftiComputeBackward&lt;BR /&gt;Computer: Win7 64bit&lt;BR /&gt;Intel Xeon 5160 3GHz (2 Processer, 4 CPU)&lt;BR /&gt;Memory 32GB&lt;/P&gt;</description>
      <pubDate>Tue, 13 Mar 2012 16:20:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Major-slow-down-for-inverse-2D-FFT/m-p/791758#M2268</guid>
      <dc:creator>James_K_2</dc:creator>
      <dc:date>2012-03-13T16:20:12Z</dc:date>
    </item>
    <item>
      <title>Major slow down for inverse 2D FFT</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Major-slow-down-for-inverse-2D-FFT/m-p/791759#M2269</link>
      <description>Hi J,&lt;BR /&gt;&lt;BR /&gt;I suppose you do complex-to-complex transform.&lt;BR /&gt;Didyou multiply the coefficients by a separate pass over the data, or by setting DFTI_BACKWARD_SCALE?&lt;BR /&gt;If you swap the calls of Forward and Backward (that is Backward first, then multiplication with coefficients, and then Forward), does this slows down the forward then?&lt;BR /&gt;Maybe, after multiplication with the coefficients some values becomes too small (denormalized) and this slows down the computation.&lt;BR /&gt;What version of MKL do you use?&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;Dima&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 14 Mar 2012 06:35:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Major-slow-down-for-inverse-2D-FFT/m-p/791759#M2269</guid>
      <dc:creator>Dmitry_B_Intel</dc:creator>
      <dc:date>2012-03-14T06:35:09Z</dc:date>
    </item>
    <item>
      <title>Major slow down for inverse 2D FFT</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Major-slow-down-for-inverse-2D-FFT/m-p/791760#M2270</link>
      <description>&lt;P&gt;Yes, I do the multiplication with a separate pass, not included in timing, because I am implementing a type of band pass filter so it's not a single value.&lt;/P&gt;&lt;P&gt;The version of MKL is 10.3 patch 4. I am in the process of updating to latest 10.3.&lt;/P&gt;&lt;P&gt;Some additional information since I made the post.&lt;/P&gt;&lt;P&gt;Almost all the values are 0, ranging to -1. I have done more testing and it's not the multiplication, it's related to the values being used. When I use the same code but (at the last second) reset all weights to 1 I don't get the slow down.&lt;/P&gt;&lt;P&gt;I'm a little confused why having all values close to zero would cause an issue. If I wanted to filter out all but a few frequencies I would set almost all values to zero. This would be the same issue. I could understand a slight slow down (ie. 1.3 sec to 1.5 sec) but not the very large slow down.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Thanks very much,&lt;BR /&gt;&lt;BR /&gt;Jim K.&lt;/P&gt;</description>
      <pubDate>Wed, 14 Mar 2012 11:35:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Major-slow-down-for-inverse-2D-FFT/m-p/791760#M2270</guid>
      <dc:creator>James_K_2</dc:creator>
      <dc:date>2012-03-14T11:35:59Z</dc:date>
    </item>
    <item>
      <title>Major slow down for inverse 2D FFT</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Major-slow-down-for-inverse-2D-FFT/m-p/791761#M2271</link>
      <description>&lt;P&gt;Jim,&lt;BR /&gt;Floating point operations with very small (denormalized) numbers are expensive. &lt;BR /&gt;You could easily see if this is the issue by setting flush denormals bit before DftiComputeBackward:&lt;BR /&gt;&lt;BR /&gt;_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);&lt;BR /&gt;&lt;BR /&gt;The macro is defined in&lt;BR /&gt;#include &lt;XMMINTRIN.H&gt;&lt;BR /&gt;&lt;BR /&gt;Alternatively, if you use Intel compiler, flush to zero can be set by adding -Qftz flag when compiling the main() function.&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;Dima&lt;/XMMINTRIN.H&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 14 Mar 2012 13:58:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Major-slow-down-for-inverse-2D-FFT/m-p/791761#M2271</guid>
      <dc:creator>Dmitry_B_Intel</dc:creator>
      <dc:date>2012-03-14T13:58:08Z</dc:date>
    </item>
    <item>
      <title>Major slow down for inverse 2D FFT</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Major-slow-down-for-inverse-2D-FFT/m-p/791762#M2272</link>
      <description>&lt;P&gt;That was the cause.&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Now I just need to figure out how the may impact the rest of the calculations in my system.&lt;BR /&gt;&lt;BR /&gt;Jim K.&lt;/P&gt;</description>
      <pubDate>Wed, 14 Mar 2012 18:31:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Major-slow-down-for-inverse-2D-FFT/m-p/791762#M2272</guid>
      <dc:creator>James_K_2</dc:creator>
      <dc:date>2012-03-14T18:31:39Z</dc:date>
    </item>
    <item>
      <title>Major slow down for inverse 2D FFT</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Major-slow-down-for-inverse-2D-FFT/m-p/791763#M2273</link>
      <description>&lt;DIV id="tiny_quote"&gt;&lt;DIV style="margin-left: 2px; margin-right: 2px;"&gt;Quoting &lt;A jquery1331817082265="58" rel="/en-us/services/profile/quick_profile.php?is_paid=&amp;amp;user_id=176067" href="https://community.intel.com/en-us/profile/176067/" class="basic"&gt;jkramer@zygo.com&lt;/A&gt;&lt;/DIV&gt;&lt;DIV style="background-color: #e5e5e5; margin-left: 2px; margin-right: 2px; border: 1px inset; padding: 5px;"&gt;&lt;I&gt;...I'm a little confused why having all values close to zero would cause an issue...&lt;/I&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;BR /&gt;Take a look at astate of &lt;STRONG&gt;FP&lt;/STRONG&gt; operations( see &lt;STRONG&gt;_control87&lt;/STRONG&gt; CRT-function or &lt;STRONG&gt;SSE&lt;/STRONG&gt; macro &lt;STRONG&gt;_MM_GET_EXCEPTION_STATE&lt;/STRONG&gt;).&lt;BR /&gt;There is a possibility that some&lt;STRONG&gt;FP&lt;/STRONG&gt; exceptions, like Inexact Value( see &lt;STRONG&gt;_EM_INEXACT&lt;/STRONG&gt; ), are causing the performanceproblem.&lt;/P&gt;</description>
      <pubDate>Thu, 15 Mar 2012 13:25:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Major-slow-down-for-inverse-2D-FFT/m-p/791763#M2273</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2012-03-15T13:25:14Z</dc:date>
    </item>
  </channel>
</rss>

