Hi Sidney,

Sidney_Cadot · ‎10-13-2015

Hello,

I am pretty sure I found a bug in the DFTI (FFT) routine of MKL.

When doing a 2D complex-to-real FFT, for some array sizes, the FFT routine writes precisely 2 real values too many. In those cases, the FFT result is incorrect.

This happens both for single- en double-precision invocations.

For single-precision invocations, this happens precisely when: num_rows >= 30 and num_cols >= 22 and num_cols % 16 == 14.

For double-precision invocations, this happens precisely when: num_rows >= 17 and num_cols >= 22 and num_cols % 8 == 6.

A program that demonstrates the issue is shown below.

Kind regards, Sidney Cadot

//////////////////////////
// InvestigateMklBug.cc //
//////////////////////////

#include <iostream>
#include <complex>
#include <cassert>
#include <vector>

#include <mkl.h>

void testcase(unsigned  num_rows, unsigned num_cols)
{
    // We will do a complex-to-real 2D FFT.
    //
    // The real array is    (num_rows x nul_cols).
    // The complex_array is (num_rows x nul_cols_complex), where nul_cols_complex == num_cols / 2 + 1.
    //
    // It turns out that, for some values of num_rows/num_cols, the FFT writes beyond the last entry
    // of the 'real_array'.
    //
    // We investigate this by allocating 'real_array' with a few elements (EXTRA_ENTRIES) too many.
    //
    // Prior to the FFT, we initialize 'real_array' with a GUARD_VALUE.
    //
    // After the FFT, we check the number of GUARD_VALUEs still present.
    //
    // If this is less than the number of EXTRA_ENTRIES, elements were overwritten that shouldn't have.

    const unsigned num_cols_complex = num_cols / 2 + 1;

    // setup DFTI descriptor

    DFTI_DESCRIPTOR_HANDLE descriptor;

    const MKL_LONG dimensions[2] = {num_rows, num_cols};
    MKL_LONG status = DftiCreateDescriptor(&descriptor, DFTI_SINGLE, DFTI_REAL, 2, dimensions);
    assert(status == DFTI_NO_ERROR);

    status = DftiSetValue(descriptor, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
    assert(status == DFTI_NO_ERROR);

    // The manual recommends setting this to DFTI_COMPLEX_COMPLEX.
    status = DftiSetValue(descriptor, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX);
    assert(status == DFTI_NO_ERROR);

    const MKL_LONG input_strides[3] = {0, num_cols_complex, 1};
    status = DftiSetValue(descriptor, DFTI_INPUT_STRIDES, input_strides);
    assert(status == DFTI_NO_ERROR);

    const MKL_LONG output_strides[3] = {0, num_cols, 1};
    status = DftiSetValue(descriptor, DFTI_OUTPUT_STRIDES, output_strides);
    assert(status == DFTI_NO_ERROR);

    const MKL_LONG thread_limit = 1;
    status = DftiSetValue(descriptor, DFTI_THREAD_LIMIT, thread_limit);
    assert(status == DFTI_NO_ERROR);

    status = DftiCommitDescriptor(descriptor);
    assert(status == DFTI_NO_ERROR);

    // Do inverse-fft 

    const unsigned EXTRA_ENTRIES = 64;
    const float    GUARD_VALUE   = 999.0;

    std::vector<std::complex<float>> complex_array(num_rows * num_cols_complex);
    std::vector             <float > real_array   (num_rows * num_cols + EXTRA_ENTRIES, GUARD_VALUE);

    // Execute the FFT and free the descriptor.

    status = DftiComputeBackward(descriptor, complex_array.data(), real_array.data());
    assert(status == DFTI_NO_ERROR);

    status = DftiFreeDescriptor(&descriptor);
    assert(status == DFTI_NO_ERROR);

    // Investigate whether the FFT wrote more entries than expected.

    unsigned count_guards = 0;
    for (unsigned i = 0; i < real_array.size(); ++i)
    {
        if (real_array == GUARD_VALUE)
        {
            ++count_guards;
        }
    }

    assert(count_guards <= EXTRA_ENTRIES);

    bool problem_detected = (count_guards != EXTRA_ENTRIES); // They should be the same.

    if (problem_detected)
    {
        std::cout << "num_rows " << num_rows << " num_cols " << num_cols << " num_cols_complex " << num_cols_complex << " --> wrote " << (EXTRA_ENTRIES - count_guards) << " real_array entries too many." << std::endl;
    }
}

int main()
{
    for (unsigned num_rows = 1; num_rows <= 200; ++num_rows)
    {
        for (unsigned num_cols = 1; num_cols <= 200; ++num_cols)
        {
            testcase(num_rows, num_cols);
        }
    }
    return 0;
}

Evgueni_P_Intel · ‎10-13-2015

Hi Sidney,

This is a known issue in MKL 11.3.0.109.

The following knowledge base article explains how to obtain Intel MKL package with this and other issues fixed.

https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/595045

Evgueni

Sidney_Cadot · ‎10-13-2015

Hi Evgueni

Well that's pretty much my diagnosis right there.

Thanks for the pointer, I will ask for the hot-fix.

Regards, Sidney

Bug in MKL 11.3.0.109 FFT