<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi Sergey! in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938437#M14303</link>
    <description>&lt;P&gt;Hi Sergey!&lt;/P&gt;
&lt;P&gt;Do you think that raising thread's priority in main function just before the call to tested function has any significiance? Do I need to raise the thread's priority inside the tested function?&lt;/P&gt;
&lt;P&gt;Thanks in advance,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 12 Feb 2013 05:24:59 GMT</pubDate>
    <dc:creator>Bernard</dc:creator>
    <dc:date>2013-02-12T05:24:59Z</dc:date>
    <item>
      <title>Comparing FFT Performance MKL6 vs MKL11</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938426#M14292</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;
&lt;P&gt;I'm currently evaluating MKL11 to decide, if it should replace the older MKL6, that is used till now. I wrote a little console application to compare the FFT performance (for the moment just the computation time, not the numerical exactness), but the results rather suprised me, the MKL11 seems to be slower than MKL6.&lt;/P&gt;
&lt;P&gt;The program runs 1100 FFTs with different lengths and measures the time. The attached plots show avg/min/max plots of 1100 loops (green). The red curves excluded the first 100 loops from the logging - no big difference there. The time is for each FFT calculation.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Plotting both average curves shows, that the MKL6 needs approximately half the time.&lt;/P&gt;
&lt;P&gt;I was a bit surprised by these results - does anyone have experience on the FFT performance? Another thing that keeps me wondering are the outliners in the MKL11, that don't occur that much with MKL6.&lt;/P&gt;
&lt;P&gt;My testing code:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;int FFT_Kernel_float(unsigned int Nfft, void* pIn, void* pOut)&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;{&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp; int status;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp; DFTI_DESCRIPTOR_HANDLE hand = 0;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp; status = DftiCreateDescriptor(&amp;amp;hand, DFTI_SINGLE, DFTI_REAL, 1, Nfft);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp; status = DftiSetValue(hand, DFTI_PLACEMENT, DFTI_NOT_INPLACE);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp; status = DftiCommitDescriptor(hand);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp; status = DftiComputeForward(hand, pIn, pOut);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp; DftiFreeDescriptor(&amp;amp;hand);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp; return status;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;}&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;for (exp=exponent_start;exp&amp;lt;=exponent_stop;exp++) //2^4 to 2^20 &lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp; {&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Nfft = (unsigned int) pow(2.0,exp);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cxfTimesig.alloc(Nfft);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cxfTimeaxis.alloc(Nfft);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cxfFreqsig.alloc(Nfft);&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (i=0;i&amp;lt;Nfft;i++)&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cxfTimeaxis&lt;I&gt; = ((float) i + 1.0) / fs;&lt;/I&gt;&lt;/STRONG&gt;&lt;I&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cxfTimesig&lt;I&gt;&amp;nbsp; = ((float)rnd.Get()/UINT_MAX)*2-1; //random signal&lt;/I&gt;&lt;/STRONG&gt;&lt;I&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Time_all_min&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = 1e6;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Time_all_max&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = 0;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Time_firstexcl_min = 1e6;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Time_firstexcl_max = 0;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hpfcAllLoops.Start(); //start time for all loops&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (i=0;i&amp;lt;loops;i++) //loops = 1100&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (i==exclude_first_from_avg-1)&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hpfcFirstExcluded.Start(); //start timer for loops after first excluded loops&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hpfcIndividual.Start(); //start timer for single execution&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; status = FFT_Kernel_float(Nfft,cxfTimesig.ptr(), cxfFreqsig.ptr());&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/STRONG&gt;&lt;STRONG&gt;Time_individual = hpfcIndividual.Time();&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if (i&amp;gt;=exclude_first_from_avg-1) //exclude_first_from_avg = 100&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Time_firstexcl_max = max(Time_firstexcl_max,Time_individual);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Time_firstexcl_min = min(Time_firstexcl_min,Time_individual);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/STRONG&gt;&lt;STRONG&gt;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Time_all_max = max(Time_all_max,Time_individual);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Time_all_min = min(Time_all_min,Time_individual);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Time_all_tot&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = hpfcAllLoops.Time();&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Time_firstexcl_tot = hpfcFirstExcluded.Time();&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Time_firstexcl_avg = Time_firstexcl_tot / (double) (loops - exclude_first_from_avg);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Time_all_avg&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = Time_all_tot&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; / (double) loops;&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; //log data here&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp; }&lt;/STRONG&gt;&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;BR /&gt;&lt;/STRONG&gt;Any opinions or experiences on this issue?Am I comparing apples and oranges?&lt;/P&gt;
&lt;P&gt;Marian&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;(Win7, Intel i5-2500, C++, Visual Studio 2008)&lt;/P&gt;</description>
      <pubDate>Thu, 07 Feb 2013 12:38:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938426#M14292</guid>
      <dc:creator>Marian_L_</dc:creator>
      <dc:date>2013-02-07T12:38:25Z</dc:date>
    </item>
    <item>
      <title>Hello Marian, </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938427#M14293</link>
      <description>&lt;P&gt;Hello Marian,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;this is the expected results because of you measure the time of the whole routine which includes creation, initialization&amp;nbsp;of descriptor and etc....whithin these parts of FFT calculations we allocate memory for internal data and initialializing this data and etc... these procedures are not highly optimized. &amp;nbsp;But usually this part of FFT &amp;nbsp;is done one time only.&lt;/P&gt;
&lt;P&gt;Please check the computation phase of FFT (in this case -- &lt;STRONG&gt;DftiComputeForward&lt;/STRONG&gt;)&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;hpfcIndividual.Start(); //start timer for single execution&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; status = &lt;STRONG&gt;DftiComputeForward&lt;STRONG&gt;(hand, pIn, pOut);&lt;/STRONG&gt;&lt;/STRONG&gt;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;STRONG&gt;Time_individual = hpfcIndividual.Time();&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;and let us know the results you will see.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Feb 2013 19:00:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938427#M14293</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2013-02-07T19:00:52Z</dc:date>
    </item>
    <item>
      <title>Hello Gennady,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938428#M14294</link>
      <description>&lt;P&gt;Hello Gennady,&lt;/P&gt;
&lt;P&gt;thanks for your response. Indeed, the test you proposed yields a faster computation time of the MKL11.&lt;/P&gt;
&lt;P&gt;Why is the allocation time slower now? I'm currently using a wrapper function (like in the code) for the FFT, do you recommend keeping the handle?&lt;/P&gt;
&lt;P&gt;Do you have an explanation for the higher number of outliners in MKL11's results? The ratio seems relatively constant (compared to the outliners in MKL6's results).&lt;/P&gt;
&lt;P&gt;Thanks!&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 08 Feb 2013 06:37:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938428#M14294</guid>
      <dc:creator>Marian_L_</dc:creator>
      <dc:date>2013-02-08T06:37:07Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...Any opinions or</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938429#M14295</link>
      <description>&amp;gt;&amp;gt;...Any opinions or experiences on this issue?..

Hi Marian, Thanks for these graphs and here are a couple of notes:

&lt;STRONG&gt;1&lt;/STRONG&gt;. Regarding a signal generation &lt;STRONG&gt;for&lt;/STRONG&gt; loop:

&amp;gt;&amp;gt;...
&amp;gt;&amp;gt;for (i=0;i&lt;NFFT&gt;&amp;gt;{
&amp;gt;&amp;gt;cxfTimeaxis&lt;I&gt; = ((float) i + 1.0) / fs;
&amp;gt;&amp;gt;cxfTimesig&lt;I&gt;  = ((float)&lt;STRONG&gt;rnd.Get()&lt;/STRONG&gt;/UINT_MAX)*2-1; //random signal
&amp;gt;&amp;gt;}
&amp;gt;&amp;gt;...

For &lt;STRONG&gt;precise&lt;/STRONG&gt; performance evaluations of different versions of some API it is a good thing to use identical ( &lt;STRONG&gt;not&lt;/STRONG&gt; random ) input data sets ( a signal in you case ). In that case your tests could be rated as &lt;STRONG&gt;deterministic&lt;/STRONG&gt; and &lt;STRONG&gt;reproducible&lt;/STRONG&gt;.

&lt;STRONG&gt;2&lt;/STRONG&gt;. You have some computational overhead in:

for (i=0;i&lt;LOOPS&gt;min and &lt;STRONG&gt;max&lt;/STRONG&gt; macros and I think you can save all measured times in a temporary array for post processing, that is at the end.

&lt;STRONG&gt;3&lt;/STRONG&gt;. When I do critical performance evaluations I laways raise a priority of the process to 'Above Normal' or to 'High' in order to reduce the process context switches during calculations and numbers are always better in that case. Do not forget that you do tests in a multi-tasking environment and the trick improves &lt;STRONG&gt;determinism&lt;/STRONG&gt; of results. Let me know if you need a simple example of how to do it.&lt;/LOOPS&gt;&lt;/I&gt;&lt;/I&gt;&lt;/NFFT&gt;</description>
      <pubDate>Fri, 08 Feb 2013 14:43:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938429#M14295</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-02-08T14:43:00Z</dc:date>
    </item>
    <item>
      <title>Hi Marian</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938430#M14296</link>
      <description>&lt;P&gt;Hi Marian&lt;/P&gt;
&lt;P&gt;Can you post the MKL FFT results for 4096 point FFT?&lt;/P&gt;</description>
      <pubDate>Sun, 10 Feb 2013 09:24:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938430#M14296</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-02-10T09:24:08Z</dc:date>
    </item>
    <item>
      <title>Hi all,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938431#M14297</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;
&lt;P&gt;thanks for the comments.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;@iliyapolak: for Nfft=4096, the results for the second test (random signal with only DftiComputeForward)&lt;/P&gt;
&lt;P&gt;(MKL11) avg:5.756583E-006&amp;nbsp;&amp;nbsp;&amp;nbsp; min:5.286975E-006&amp;nbsp;&amp;nbsp;&amp;nbsp; max:1.586093E-005 (seconds)&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;@Sergey Kostrov: Thank you very much for that input, I'll consider it next time&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Yes, you're right, that would improve determinism. Although, the number of loops proves the general trend. If I find the time, I'll do the test with a non-random signal.&lt;/LI&gt;
&lt;LI&gt;Correct, but the overhead is in both tests (version 6 and 11).&lt;/LI&gt;
&lt;LI&gt;The general trend is clear to me now. If it is not too much work I'd appreciate the example, though.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;What I don't know is, why the complete wrapper function (including the allocations) is slower in the new version. Currently the whole software project uses such a wrapper function.&lt;/P&gt;</description>
      <pubDate>Mon, 11 Feb 2013 08:48:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938431#M14297</guid>
      <dc:creator>Marian_L_</dc:creator>
      <dc:date>2013-02-11T08:48:00Z</dc:date>
    </item>
    <item>
      <title>@iliyapolak: for Nfft=4096,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938432#M14298</link>
      <description>&lt;P&gt;@iliyapolak: for Nfft=4096, the results for the second test (random signal with only DftiComputeForward)&lt;/P&gt;
&lt;P&gt;(MKL11) avg:5.756583E-006&amp;nbsp;&amp;nbsp;&amp;nbsp; min:5.286975E-006&amp;nbsp;&amp;nbsp;&amp;nbsp; max:1.586093E-005 (seconds)&lt;/P&gt;
&lt;P&gt;Thanks for posting.&lt;/P&gt;
&lt;P&gt;My results for 4096 FFT&amp;nbsp;of &amp;nbsp;sine function were approximately&amp;nbsp;&amp;nbsp;~212145 nanoseconds.&lt;/P&gt;</description>
      <pubDate>Mon, 11 Feb 2013 09:23:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938432#M14298</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-02-11T09:23:18Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...the trick improves</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938433#M14299</link>
      <description>&amp;gt;&amp;gt;...the trick improves determinism of results. Let me know if you need a simple example of how to do it...

&lt;STRONG&gt;[ Using Win32 API ]&lt;/STRONG&gt;
...
::&lt;STRONG&gt;SetPriorityClass&lt;/STRONG&gt;( ::&lt;STRONG&gt;GetCurrentProcess()&lt;/STRONG&gt;, HIGH_PRIORITY_CLASS ); // Set a process priority to 'High'
...
...some processing...
...
::&lt;STRONG&gt;SetPriorityClass&lt;/STRONG&gt;( ::&lt;STRONG&gt;GetCurrentProcess&lt;/STRONG&gt;(), NORMAL_PRIORITY_CLASS ); // Restore the process priority to 'Normal'
...
Note: Take into account that:

- OpenMP threads use 'Normal' priority and could be preempted by threads with higher priorities
- Virtual Memory Manager also could be preempted and all memory allocations have to be done before a change of process priority

In any case verifications have to be done in order to see if there is a positive result.</description>
      <pubDate>Mon, 11 Feb 2013 14:03:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938433#M14299</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-02-11T14:03:18Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;&gt;Virtual Memory Manager</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938434#M14300</link>
      <description>&amp;gt;&amp;gt;&amp;gt;Virtual Memory Manager also could be preempted and all memory allocations have to be done before a change of process priority&amp;gt;&amp;gt;&amp;gt;

Memory manager is executing at DPC level ant its code can not be paged out nor preempted by the user mode thread.Only kernel mode code whch is running at DPC or above level can preempt Mm.Thread's priority is not the same as IRQL.</description>
      <pubDate>Mon, 11 Feb 2013 16:48:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938434#M14300</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-02-11T16:48:42Z</dc:date>
    </item>
    <item>
      <title>I tend to raise the executing</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938435#M14301</link>
      <description>&lt;P&gt;I tend to raise the executing thread priority inside main() , where the tested function is called.Here is example of FFT on 4096 points data set&lt;/P&gt;

&lt;P&gt;// FFT test-case Intel Compiler.cpp : Defines the entry point for the console application.&lt;BR /&gt;
	//&lt;/P&gt;

&lt;P&gt;#include "stdafx.h"&lt;BR /&gt;
	#include &amp;lt;stdio.h&amp;gt;&lt;BR /&gt;
	#include &amp;lt;stdlib.h&amp;gt;&lt;BR /&gt;
	#include &amp;lt;time.h&amp;gt;&lt;BR /&gt;
	#include &amp;lt;math.h&amp;gt;&lt;BR /&gt;
	#include &amp;lt;Windows.h&amp;gt;&lt;BR /&gt;
	#include &amp;lt;tchar.h&amp;gt;&lt;BR /&gt;
	#define MAXITER10K 10000UL&lt;BR /&gt;
	#define DATA_SET_64&amp;nbsp; 64&lt;BR /&gt;
	#define DATA_SET_128&amp;nbsp; 128&lt;BR /&gt;
	#define DATA_SET_256&amp;nbsp; 256&lt;BR /&gt;
	#define DATA_SET_512&amp;nbsp; 512&lt;BR /&gt;
	#define DATA_SET_1024&amp;nbsp; 1024&lt;BR /&gt;
	#define DATA_SET_2048&amp;nbsp; 2048&lt;BR /&gt;
	#define DATA_SET_4096&amp;nbsp; 4096&lt;BR /&gt;
	#define DATA_SET_8192&amp;nbsp; 8192&lt;BR /&gt;
	#define DATA_SET_16384&amp;nbsp; 16384&lt;BR /&gt;
	#define DATA_SET_32768&amp;nbsp;&amp;nbsp; 32768&lt;BR /&gt;
	#define SWAP(a,b) temp=(a);(a)=(b);(b)=temp&lt;BR /&gt;
	void fourier1(double data[],unsigned long nn, int isign);&lt;/P&gt;

&lt;P&gt;int _tmain(int argc, _TCHAR* argv[])&lt;BR /&gt;
	{&lt;BR /&gt;
	&amp;nbsp;DWORD error,thPriority;&lt;BR /&gt;
	&amp;nbsp;BOOL bOK,bOK2;&lt;BR /&gt;
	&amp;nbsp;double a = 0;&lt;BR /&gt;
	&amp;nbsp;int i,q;&lt;BR /&gt;
	&amp;nbsp;LONGLONG start,end;&lt;BR /&gt;
	&amp;nbsp;const unsigned long MaxIter = 1e+6;&lt;BR /&gt;
	&amp;nbsp;double test[DATA_SET_4096];&lt;BR /&gt;
	&amp;nbsp;thPriority = GetThreadPriority(GetCurrentThread());&lt;BR /&gt;
	&amp;nbsp;printf("current thread priority is 0x%x\n",thPriority);&lt;BR /&gt;
	&amp;nbsp;bOK = SetThreadPriority(GetCurrentThread(),THREAD_PRIORITY_TIME_CRITICAL);&lt;BR /&gt;
	&amp;nbsp;if(!bOK)&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;printf("failed to boost current thread priority (%d) \n",GetLastError());&lt;BR /&gt;
	&amp;nbsp;else&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;printf("current thread priority after boost is 0x%x \n",GetThreadPriority(GetCurrentThread()));&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&lt;BR /&gt;
	&amp;nbsp;srand((unsigned)time(NULL));&lt;/P&gt;

&lt;P&gt;&amp;nbsp;for( i = 0;i &amp;lt; DATA_SET_4096;i++){&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;a = (double)RAND_MAX/DATA_SET_4096;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;test&lt;I&gt; = (double)rand()&amp;nbsp; /a;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&lt;BR /&gt;
	&amp;nbsp;}&lt;/I&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;//start = GetTickCount64();&lt;BR /&gt;
	&amp;nbsp;//for(q = 0;q &amp;lt; MAXITER10K ;q++){&lt;/P&gt;

&lt;P&gt;&amp;nbsp;fourier1(test,DATA_SET_2048,1);&lt;BR /&gt;
	&amp;nbsp;bOK2 = SetThreadPriority(GetCurrentThread(),THREAD_PRIORITY_NORMAL);&lt;BR /&gt;
	&amp;nbsp;if(!bOK2)&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;printf("failed to lower priority to normal (%d) \n",GetLastError());&lt;BR /&gt;
	&amp;nbsp;else&lt;BR /&gt;
	&amp;nbsp;printf("thread priority returned to normal 0x%x \n",GetThreadPriority(GetCurrentThread()));&lt;BR /&gt;
	&amp;nbsp;//}&lt;BR /&gt;
	&amp;nbsp;//end = GetTickCount64();&lt;/P&gt;

&lt;P&gt;&amp;nbsp;/*printf("Intel compiler testcase start value is %ld msec\n",start);&lt;BR /&gt;
	&amp;nbsp;printf("Intel compiler testcase end value is&amp;nbsp;&amp;nbsp; %ld msec\n",end);&lt;BR /&gt;
	&amp;nbsp;printf("Intel compiler resulting overhead is %ld msec\n",(end-start));*/&lt;/P&gt;

&lt;P&gt;&amp;nbsp;for( i = 0;i &amp;lt; DATA_SET_4096;i++)printf("FFT test-case 1, fourier transform of _j0() = %.17f\n",test&lt;I&gt;);&lt;BR /&gt;
	&amp;nbsp;return 0;&lt;BR /&gt;
	}&lt;/I&gt;&lt;/P&gt;

&lt;P&gt;&lt;BR /&gt;
	void fourier1(double data[],unsigned long nn,int isign){&lt;BR /&gt;
	&amp;nbsp;unsigned int start_lo,start_hi,time_lo,time_hi;&lt;BR /&gt;
	&amp;nbsp;unsigned long n,mmax,m,j,istep,i;&lt;BR /&gt;
	&amp;nbsp;double wtemp,wr,wpr,wpi,wi,theta,temp,tempi;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;n = nn&amp;lt;&amp;lt;1;&lt;BR /&gt;
	&amp;nbsp;j = 1;&lt;/P&gt;

&lt;P&gt;&amp;nbsp; _asm{&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; xor eax,eax&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; xor edx,edx&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cpuid&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; rdtsc&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; mov start_lo,eax&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; mov start_hi,edx&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/P&gt;

&lt;P&gt;&amp;nbsp;for( i = 1;i &amp;lt; n;i += 2){&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;if(j &amp;lt; i){&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;SWAP(data&lt;J&gt;,data&lt;I&gt;);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;SWAP(data[j+1],data[i+1]);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;}&lt;/I&gt;&lt;/J&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;m = n &amp;gt;&amp;gt; 1;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;while(m &amp;gt;= 2 &amp;amp;&amp;amp; j &amp;gt; m){&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;j -= m;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;m &amp;gt;&amp;gt;= 1;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;}&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; j+=m;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;}&lt;/P&gt;

&lt;P&gt;&amp;nbsp;mmax = 2;&lt;BR /&gt;
	&amp;nbsp;while(n &amp;gt; mmax){&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;istep=mmax &amp;lt;&amp;lt; 1;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;theta = isign*(6.28318530717959/mmax);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;wtemp = sin(0.5*theta);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;wpr = -2.0*wtemp*wtemp;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;wpi = sin(theta);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;wr = 1.0;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;wi = 0.0;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;for(m = 1;m &amp;lt; mmax;m+=2){&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;for(i = m;i&amp;lt;=n;i+=istep){&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;j = i+mmax;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;temp = wr*data&lt;J&gt;-wi*data[j+1];&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;tempi = wr*data[j+1]+wi*data&lt;J&gt;;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;data&lt;J&gt; = data&lt;I&gt; - temp;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;data[j+1] = data[i+1] - tempi;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;data&lt;I&gt; += temp;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;data[i+1] += tempi;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;/I&gt;&lt;/I&gt;&lt;/J&gt;&lt;/J&gt;&lt;/J&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;wr = (wtemp=wr)*wpr-wi*wpi+wr;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;wi = wi*wpr+wtemp*wpi+wi;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;}&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;mmax = istep;&lt;BR /&gt;
	&amp;nbsp;}&lt;/P&gt;

&lt;P&gt;&amp;nbsp;_asm{&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; xor eax,eax&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; xor edx,edx&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cpuid&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; rdtsc&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sub eax,start_lo&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sub edx,start_hi&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; mov time_lo,eax&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; mov time_hi,edx&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; printf("%15s \n","[running time of fourier() function start_lo value ]:");&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; printf("%10d \n",start_lo);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; printf("%15s \n","[running time of fourier() function start_hi value ]:");&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; printf("10d \n",start_hi);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; printf("%15s \n","[ running time of fourier function end_lo value ]:");&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; printf("[ LoDelta ] = %10d \n",time_lo);&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; printf("%15s \n","[running time of fourier() function end_hi value ]:");&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; printf("[HiDelta ] = %10d \n",time_hi);&lt;/P&gt;

&lt;P&gt;}&lt;/P&gt;</description>
      <pubDate>Mon, 11 Feb 2013 17:17:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938435#M14301</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-02-11T17:17:00Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...Memory manager is</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938436#M14302</link>
      <description>&amp;gt;&amp;gt;...Memory manager is executing at DPC level ant its code can not be paged out nor preempted by the user mode
&amp;gt;&amp;gt;thread...

My &lt;STRONG&gt;statement is based on results of real tests&lt;/STRONG&gt; completed on a computer system with Windows 2000 Professional OS ( 32-bit ) that simulates embedded system with just 128MB of memory.</description>
      <pubDate>Tue, 12 Feb 2013 00:50:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938436#M14302</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-02-12T00:50:46Z</dc:date>
    </item>
    <item>
      <title>Hi Sergey!</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938437#M14303</link>
      <description>&lt;P&gt;Hi Sergey!&lt;/P&gt;
&lt;P&gt;Do you think that raising thread's priority in main function just before the call to tested function has any significiance? Do I need to raise the thread's priority inside the tested function?&lt;/P&gt;
&lt;P&gt;Thanks in advance,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 12 Feb 2013 05:24:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938437#M14303</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-02-12T05:24:59Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;&gt;My statement is based on</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938438#M14304</link>
      <description>&lt;P&gt;&amp;gt;&amp;gt;&amp;gt;My &lt;STRONG&gt;statement is based on results of real tests&lt;/STRONG&gt; completed on a computer system with Windows 2000 Professional OS ( 32-bit ) that simulates embedded system with just 128MB of memory.&amp;gt;&amp;gt;&amp;gt;&lt;/P&gt;
&lt;P&gt;I did not perform any of my tests related to thread's priority on the system with only 128MB of RAM.Put it simply my arguments are based on "Windows Internals" book.&lt;/P&gt;</description>
      <pubDate>Tue, 12 Feb 2013 05:28:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938438#M14304</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-02-12T05:28:29Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...Do you think that</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938439#M14305</link>
      <description>&amp;gt;&amp;gt;...Do you think that raising thread's priority in main function just before the call to tested function has any significiance?

I think Yes if you need as accurate as possible results.

&amp;gt;&amp;gt;Do I need to raise the thread's priority inside the tested function?

That depends on what you're trying to do, Iliya. Personally, I try to do it when &lt;STRONG&gt;all&lt;/STRONG&gt; initializations are completed and main processing is ready to be started. I'll try to provide test results of some processing with 'Normal' and 'Real-Time' priorities.</description>
      <pubDate>Tue, 12 Feb 2013 05:43:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938439#M14305</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-02-12T05:43:00Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;&gt;That depends on what you</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938440#M14306</link>
      <description>&lt;P&gt;&amp;gt;&amp;gt;&amp;gt;That depends on what you're trying to do, Iliya. Personally, I try to do it when &lt;STRONG&gt;all&lt;/STRONG&gt; initializations are completed and main processing is ready to be started. I'll try to provide test results of some processing with 'Normal' and 'Real-Time' priorities&amp;gt;&amp;gt;&amp;gt;&lt;/P&gt;
&lt;P&gt;Thanks for the answer.Later I will test my FFT function when the calls to thread priority API are made right after initialization and before main calculation loop.I will post the results.&lt;/P&gt;</description>
      <pubDate>Tue, 12 Feb 2013 06:27:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938440#M14306</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-02-12T06:27:22Z</dc:date>
    </item>
    <item>
      <title>Hi Sergey!</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938441#M14307</link>
      <description>Hi Sergey!

Tested FFT function which performed fourier transform of 4096 sine data set.Thread priority was set to Time_Critical in main().The result was measured with the help of serialized RDTSC instruction.The result is ~369856,6371681416 nanoseconds.

I placed function calls which increment thread priority to Time_Critical inside FFT function , but there was not any performance gain.The result was ~377989,3805309735 nanoseconds.At the time of testing the DPC and ISR load of the CPU was almost 0(as reported by process explorer).I need to further investigate this.</description>
      <pubDate>Tue, 12 Feb 2013 17:12:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Comparing-FFT-Performance-MKL6-vs-MKL11/m-p/938441#M14307</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-02-12T17:12:13Z</dc:date>
    </item>
  </channel>
</rss>

