<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic MKL FFT crashes when multi-threaded and for non-power 2 size in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-crashes-when-multi-threaded-and-for-non-power-2-size/m-p/947283#M14979</link>
    <description>&lt;P&gt;&amp;nbsp;&amp;nbsp; BUG:&lt;BR /&gt;MKL FFT crashes (Segmentation faults) for certain FFT sizes (for example 2496, when using complex numbers, )&lt;BR /&gt;&lt;BR /&gt;crash observed with cpp_studio_xe_2013_update1_intel64.tgz&lt;BR /&gt;when compiled with icc and with gcc.&lt;BR /&gt;crash not observed when compiled with icc and -mkl=sequentail&lt;/P&gt;
&lt;P&gt;I am running it on&amp;nbsp; a Intel® Xeon® Processor E5-2670 (8 cores per CPU)&lt;BR /&gt;&lt;BR /&gt;for(unsigned nrOfSamples = 1;nrOfSamples &amp;lt;10000;++nrOfSamples );&lt;BR /&gt;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; std::cout &amp;lt;&amp;lt; "nrOfSamples " &amp;lt;&amp;lt; nrOfSamples &amp;lt;&amp;lt; std::endl;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; fflush(NULL);&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MKL_LONG status;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; DFTI_DESCRIPTOR_HANDLE _fft;&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // Create the MKL FFT descriptor&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; status = DftiCreateDescriptor(&amp;amp;_fft, DFTI_SINGLE, DFTI_COMPLEX,1, nrOfSamples);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; checkStatus(status);&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // The FFT is now fully specified&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; status = DftiCommitDescriptor(_fft);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; checkStatus(status);&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // allocate buffer (make buffer too big, just to be sure that inplace FFT does not go beyond allocate memory&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; std::complex&amp;lt;float&amp;gt; *x = new std::complex&amp;lt;float&amp;gt;[nrOfSamples*100];&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // Calculate forward FFT&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; status = DftiComputeForward(_fft, x);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; checkStatus(status);&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // cleanup&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; delete[] x;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; status = DftiFreeDescriptor(&amp;amp;_fft);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; checkStatus(status);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;&lt;BR /&gt;-------------------------------------------------------------------&lt;BR /&gt;&lt;BR /&gt;installed : cpp_studio_xe_2013_update1_intel64.tgz&lt;BR /&gt;OS : opensuse 12.2&lt;BR /&gt;-------------------------------------------------------------------&lt;BR /&gt;ICC compiler:crash observed&lt;BR /&gt;&lt;BR /&gt;icc link options : -L$(MKLROOT)/lib/intel64 -lmkl_rt -lpthread -lm&lt;BR /&gt;compile options -mkl=parallel : crash ( Signal name : SIGSEGV, Signal meaning : Segmentation fault)&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Note : compile options -mkl=sequentail : no crash observed&lt;BR /&gt;&lt;BR /&gt;-------------------------------------------------------------------&lt;BR /&gt;GCC compiler: 4.7.1 : also crashes observed&lt;BR /&gt;-------------------------------------------------------------------&lt;/P&gt;</description>
    <pubDate>Mon, 12 Nov 2012 07:57:47 GMT</pubDate>
    <dc:creator>dirkjan</dc:creator>
    <dc:date>2012-11-12T07:57:47Z</dc:date>
    <item>
      <title>MKL FFT crashes when multi-threaded and for non-power 2 size</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-crashes-when-multi-threaded-and-for-non-power-2-size/m-p/947283#M14979</link>
      <description>&lt;P&gt;&amp;nbsp;&amp;nbsp; BUG:&lt;BR /&gt;MKL FFT crashes (Segmentation faults) for certain FFT sizes (for example 2496, when using complex numbers, )&lt;BR /&gt;&lt;BR /&gt;crash observed with cpp_studio_xe_2013_update1_intel64.tgz&lt;BR /&gt;when compiled with icc and with gcc.&lt;BR /&gt;crash not observed when compiled with icc and -mkl=sequentail&lt;/P&gt;
&lt;P&gt;I am running it on&amp;nbsp; a Intel® Xeon® Processor E5-2670 (8 cores per CPU)&lt;BR /&gt;&lt;BR /&gt;for(unsigned nrOfSamples = 1;nrOfSamples &amp;lt;10000;++nrOfSamples );&lt;BR /&gt;&amp;nbsp;&amp;nbsp; {&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; std::cout &amp;lt;&amp;lt; "nrOfSamples " &amp;lt;&amp;lt; nrOfSamples &amp;lt;&amp;lt; std::endl;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; fflush(NULL);&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MKL_LONG status;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; DFTI_DESCRIPTOR_HANDLE _fft;&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // Create the MKL FFT descriptor&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; status = DftiCreateDescriptor(&amp;amp;_fft, DFTI_SINGLE, DFTI_COMPLEX,1, nrOfSamples);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; checkStatus(status);&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // The FFT is now fully specified&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; status = DftiCommitDescriptor(_fft);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; checkStatus(status);&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // allocate buffer (make buffer too big, just to be sure that inplace FFT does not go beyond allocate memory&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; std::complex&amp;lt;float&amp;gt; *x = new std::complex&amp;lt;float&amp;gt;[nrOfSamples*100];&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // Calculate forward FFT&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; status = DftiComputeForward(_fft, x);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; checkStatus(status);&lt;BR /&gt;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; // cleanup&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; delete[] x;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; status = DftiFreeDescriptor(&amp;amp;_fft);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; checkStatus(status);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;BR /&gt;&lt;BR /&gt;-------------------------------------------------------------------&lt;BR /&gt;&lt;BR /&gt;installed : cpp_studio_xe_2013_update1_intel64.tgz&lt;BR /&gt;OS : opensuse 12.2&lt;BR /&gt;-------------------------------------------------------------------&lt;BR /&gt;ICC compiler:crash observed&lt;BR /&gt;&lt;BR /&gt;icc link options : -L$(MKLROOT)/lib/intel64 -lmkl_rt -lpthread -lm&lt;BR /&gt;compile options -mkl=parallel : crash ( Signal name : SIGSEGV, Signal meaning : Segmentation fault)&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Note : compile options -mkl=sequentail : no crash observed&lt;BR /&gt;&lt;BR /&gt;-------------------------------------------------------------------&lt;BR /&gt;GCC compiler: 4.7.1 : also crashes observed&lt;BR /&gt;-------------------------------------------------------------------&lt;/P&gt;</description>
      <pubDate>Mon, 12 Nov 2012 07:57:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-crashes-when-multi-threaded-and-for-non-power-2-size/m-p/947283#M14979</guid>
      <dc:creator>dirkjan</dc:creator>
      <dc:date>2012-11-12T07:57:47Z</dc:date>
    </item>
    <item>
      <title>yes, this example is crushed.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-crashes-when-multi-threaded-and-for-non-power-2-size/m-p/947284#M14980</link>
      <description>yes, this example is crushed. we will check more carefully what's going wrong with this code.</description>
      <pubDate>Mon, 12 Nov 2012 16:54:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-crashes-when-multi-threaded-and-for-non-power-2-size/m-p/947284#M14980</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2012-11-12T16:54:10Z</dc:date>
    </item>
    <item>
      <title>What we have discovered - the</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-crashes-when-multi-threaded-and-for-non-power-2-size/m-p/947285#M14981</link>
      <description>What we have discovered - the problem is caused by AVX code. as a temporarily work-around please try to turn off AVX branch be setting, as an example, MKL_CBWR=SSE4_2
I checked this approach on win7 and it works on my side.
--Gennady</description>
      <pubDate>Tue, 13 Nov 2012 05:23:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-crashes-when-multi-threaded-and-for-non-power-2-size/m-p/947285#M14981</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2012-11-13T05:23:17Z</dc:date>
    </item>
    <item>
      <title>Gennady,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-crashes-when-multi-threaded-and-for-non-power-2-size/m-p/947286#M14982</link>
      <description>Gennady,

Thnax for th equick response.
setting SSE4.2 worked,

Now I could run more tests, and now the next example crashes for DFTI_COMPLEX_COMPLEX (not for DFTI_COMPLEX_REAL
(crash happens typically at nrOfTransforms 3, nrOfSamples 2658):

    for (unsigned nrOfTransforms = 1; nrOfTransforms &amp;lt;= 5; ++nrOfTransforms)
    {
        for (unsigned nrOfSamples = 1; nrOfSamples &amp;lt;= 10000; ++nrOfSamples)
        {
            std::cout &amp;lt;&amp;lt; "Test 3c, Forward FFT Real-2-complex out-of-place nrOfTransforms " &amp;lt;&amp;lt; nrOfTransforms &amp;lt;&amp;lt; ", nrOfSamples " &amp;lt;&amp;lt; nrOfSamples &amp;lt;&amp;lt; std::endl;

            MKL_LONG status;
            DFTI_DESCRIPTOR_HANDLE _fft;

            // allocate buffer (make buffer too big, just to be sure that inplace FFT does not go beyond allocate memory
                                 float  *x_in  = new              float [nrOfSamples*nrOfTransforms*10];
            std::complex&lt;FLOAT&gt; *x_out = new std::complex&lt;FLOAT&gt;[nrOfSamples*nrOfTransforms*10];

            status = DftiCreateDescriptor( &amp;amp;_fft, DFTI_SINGLE, DFTI_REAL, 1, nrOfSamples);
            checkStatus(status);

            status = DftiSetValue(_fft, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
            checkStatus(status);

            // Specify the number of transforms
            status = DftiSetValue(_fft, DFTI_NUMBER_OF_TRANSFORMS, nrOfTransforms);
            checkStatus(status);

            //status = DftiSetValue(_fft, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_REAL);
            status = DftiSetValue(_fft, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX);
            checkStatus(status);

            // The FFT is now fully specified
            status = DftiCommitDescriptor( _fft );

            // Calculate forward FFT
            status = DftiComputeForward(_fft, x_in, x_out);
            checkStatus(status);

            // cleanup
            delete[] x_in, x_out;
            status = DftiFreeDescriptor(&amp;amp;_fft);
            checkStatus(status);
        }
    }&lt;/FLOAT&gt;&lt;/FLOAT&gt;</description>
      <pubDate>Tue, 13 Nov 2012 11:09:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-crashes-when-multi-threaded-and-for-non-power-2-size/m-p/947286#M14982</guid>
      <dc:creator>dirkjan</dc:creator>
      <dc:date>2012-11-13T11:09:47Z</dc:date>
    </item>
    <item>
      <title>To specify how the multiple</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-crashes-when-multi-threaded-and-for-non-power-2-size/m-p/947287#M14983</link>
      <description>To specify how the multiple input and output vectors are laid out, you should do something like this before committing the descriptor:
DftiSetValue(_fft, DFTI_INPUT_DISTANCE, nrOfSamples);
DftiSetValue(_fft, DFTI_OUTPUT_DISTANCE, nrOfSamples/2+1);

This would tell the compute function that 
1) real input element n of vector k is located in x_in[ n + nrOfSamples*k]  (here n=0...nrOfSamples-1)
2) complex output element n of vector k is located in x_out[ n + (nrOfSamples/2+1)*k] (here n=0...nrOfSamples/2)

Thanks
Dima</description>
      <pubDate>Tue, 13 Nov 2012 12:37:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-crashes-when-multi-threaded-and-for-non-power-2-size/m-p/947287#M14983</guid>
      <dc:creator>Dmitry_B_Intel</dc:creator>
      <dc:date>2012-11-13T12:37:54Z</dc:date>
    </item>
    <item>
      <title>Dima,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-crashes-when-multi-threaded-and-for-non-power-2-size/m-p/947288#M14984</link>
      <description>Dima,

you are correct that one should specify the input/output distance,

non-the-less the example code still crashes at the same position...

Dirk-Jan

Dirk-Jan</description>
      <pubDate>Tue, 13 Nov 2012 12:48:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-crashes-when-multi-threaded-and-for-non-power-2-size/m-p/947288#M14984</guid>
      <dc:creator>dirkjan</dc:creator>
      <dc:date>2012-11-13T12:48:21Z</dc:date>
    </item>
    <item>
      <title>Dirk-Jan,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-crashes-when-multi-threaded-and-for-non-power-2-size/m-p/947289#M14985</link>
      <description>Dirk-Jan,
I have reproduced the problem and I can suggest nothing but sequential FFT.
In MKL 11.0.1 there is  DFTI_THREAD_LIMIT configuration setting, which should be set to 1 before DftiCommitDescriptor. 
Thanks
Dima</description>
      <pubDate>Wed, 14 Nov 2012 03:38:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-crashes-when-multi-threaded-and-for-non-power-2-size/m-p/947289#M14985</guid>
      <dc:creator>Dmitry_B_Intel</dc:creator>
      <dc:date>2012-11-14T03:38:00Z</dc:date>
    </item>
    <item>
      <title>Any idea when a fix is</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-crashes-when-multi-threaded-and-for-non-power-2-size/m-p/947290#M14986</link>
      <description>&lt;P&gt;Any idea when a fix is planned ? for which version ?&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Dirk-Jan&lt;/P&gt;</description>
      <pubDate>Mon, 26 Aug 2013 06:15:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-crashes-when-multi-threaded-and-for-non-power-2-size/m-p/947290#M14986</guid>
      <dc:creator>dirkjan</dc:creator>
      <dc:date>2013-08-26T06:15:24Z</dc:date>
    </item>
    <item>
      <title>Dirk-Jan, please check the</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-crashes-when-multi-threaded-and-for-non-power-2-size/m-p/947291#M14987</link>
      <description>&lt;P&gt;Dirk-Jan, please check the example with the latest 11.0 update 5. I don't see the problem now.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Aug 2013 06:49:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-FFT-crashes-when-multi-threaded-and-for-non-power-2-size/m-p/947291#M14987</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2013-08-26T06:49:33Z</dc:date>
    </item>
  </channel>
</rss>

