Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

MKL FFT crashes when multi-threaded and for non-power 2 size

dirkjan
Beginner
698 Views

   BUG:
MKL FFT crashes (Segmentation faults) for certain FFT sizes (for example 2496, when using complex numbers, )

crash observed with cpp_studio_xe_2013_update1_intel64.tgz
when compiled with icc and with gcc.
crash not observed when compiled with icc and -mkl=sequentail

I am running it on  a Intel® Xeon® Processor E5-2670 (8 cores per CPU)

for(unsigned nrOfSamples = 1;nrOfSamples <10000;++nrOfSamples );
   {
        std::cout << "nrOfSamples " << nrOfSamples << std::endl;
        fflush(NULL);


        MKL_LONG status;
        DFTI_DESCRIPTOR_HANDLE _fft;

        // Create the MKL FFT descriptor
        status = DftiCreateDescriptor(&_fft, DFTI_SINGLE, DFTI_COMPLEX,1, nrOfSamples);
        checkStatus(status);

        // The FFT is now fully specified
        status = DftiCommitDescriptor(_fft);
        checkStatus(status);

        // allocate buffer (make buffer too big, just to be sure that inplace FFT does not go beyond allocate memory
        std::complex<float> *x = new std::complex<float>[nrOfSamples*100];

        // Calculate forward FFT
        status = DftiComputeForward(_fft, x);
        checkStatus(status);

        // cleanup
        delete[] x;
        status = DftiFreeDescriptor(&_fft);
        checkStatus(status);
    }

-------------------------------------------------------------------

installed : cpp_studio_xe_2013_update1_intel64.tgz
OS : opensuse 12.2
-------------------------------------------------------------------
ICC compiler:crash observed

icc link options : -L$(MKLROOT)/lib/intel64 -lmkl_rt -lpthread -lm
compile options -mkl=parallel : crash ( Signal name : SIGSEGV, Signal meaning : Segmentation fault)


Note : compile options -mkl=sequentail : no crash observed

-------------------------------------------------------------------
GCC compiler: 4.7.1 : also crashes observed
-------------------------------------------------------------------

0 Kudos
8 Replies
Gennady_F_Intel
Moderator
698 Views
yes, this example is crushed. we will check more carefully what's going wrong with this code.
0 Kudos
Gennady_F_Intel
Moderator
698 Views
What we have discovered - the problem is caused by AVX code. as a temporarily work-around please try to turn off AVX branch be setting, as an example, MKL_CBWR=SSE4_2 I checked this approach on win7 and it works on my side. --Gennady
0 Kudos
dirkjan
Beginner
698 Views
Gennady, Thnax for th equick response. setting SSE4.2 worked, Now I could run more tests, and now the next example crashes for DFTI_COMPLEX_COMPLEX (not for DFTI_COMPLEX_REAL (crash happens typically at nrOfTransforms 3, nrOfSamples 2658): for (unsigned nrOfTransforms = 1; nrOfTransforms <= 5; ++nrOfTransforms) { for (unsigned nrOfSamples = 1; nrOfSamples <= 10000; ++nrOfSamples) { std::cout << "Test 3c, Forward FFT Real-2-complex out-of-place nrOfTransforms " << nrOfTransforms << ", nrOfSamples " << nrOfSamples << std::endl; MKL_LONG status; DFTI_DESCRIPTOR_HANDLE _fft; // allocate buffer (make buffer too big, just to be sure that inplace FFT does not go beyond allocate memory float *x_in = new float [nrOfSamples*nrOfTransforms*10]; std::complex *x_out = new std::complex[nrOfSamples*nrOfTransforms*10]; status = DftiCreateDescriptor( &_fft, DFTI_SINGLE, DFTI_REAL, 1, nrOfSamples); checkStatus(status); status = DftiSetValue(_fft, DFTI_PLACEMENT, DFTI_NOT_INPLACE); checkStatus(status); // Specify the number of transforms status = DftiSetValue(_fft, DFTI_NUMBER_OF_TRANSFORMS, nrOfTransforms); checkStatus(status); //status = DftiSetValue(_fft, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_REAL); status = DftiSetValue(_fft, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX); checkStatus(status); // The FFT is now fully specified status = DftiCommitDescriptor( _fft ); // Calculate forward FFT status = DftiComputeForward(_fft, x_in, x_out); checkStatus(status); // cleanup delete[] x_in, x_out; status = DftiFreeDescriptor(&_fft); checkStatus(status); } }
0 Kudos
Dmitry_B_Intel
Employee
698 Views
To specify how the multiple input and output vectors are laid out, you should do something like this before committing the descriptor: DftiSetValue(_fft, DFTI_INPUT_DISTANCE, nrOfSamples); DftiSetValue(_fft, DFTI_OUTPUT_DISTANCE, nrOfSamples/2+1); This would tell the compute function that 1) real input element n of vector k is located in x_in[ n + nrOfSamples*k] (here n=0...nrOfSamples-1) 2) complex output element n of vector k is located in x_out[ n + (nrOfSamples/2+1)*k] (here n=0...nrOfSamples/2) Thanks Dima
0 Kudos
dirkjan
Beginner
698 Views
Dima, you are correct that one should specify the input/output distance, non-the-less the example code still crashes at the same position... Dirk-Jan Dirk-Jan
0 Kudos
Dmitry_B_Intel
Employee
698 Views
Dirk-Jan, I have reproduced the problem and I can suggest nothing but sequential FFT. In MKL 11.0.1 there is DFTI_THREAD_LIMIT configuration setting, which should be set to 1 before DftiCommitDescriptor. Thanks Dima
0 Kudos
dirkjan
Beginner
698 Views

Any idea when a fix is planned ? for which version ?

Dirk-Jan

0 Kudos
Gennady_F_Intel
Moderator
698 Views

Dirk-Jan, please check the example with the latest 11.0 update 5. I don't see the problem now.

0 Kudos
Reply