Community
cancel
Showing results for 
Search instead for 
Did you mean: 
kup__benny
Beginner
190 Views

Rare crashes on MKL

I have implemented in C++ an algorithm in image processing using (among other things)  fftw wrappers in MKL library (version 2018.3.210)

I am working on a x64 machine with Intel Xeon E5-1650 v3 3.5 GHz processor and Windows7 as OS.

Have used MS visual studio 2015 as my IDE for development and debugging, the application is multi-threaded via C++11

<thread>

 library.

When running the application over and over again I see that in about 1% of the runs it crashes.

When I have attached it to my IDE and looked at the crash dumps I saw that the crashes are always on the call:

thePlan = fftw_plan_many_dft(.....);

with the exception "Unhandled exception at someaddress (mkl_avx2.dll) in MyApp.exe: 0xC00000005 access violation reading location 0x0000000000000"

or

the exception "Unhandled exception at someaddress (mkl_avx2.dll) in MyApp.exe 0xC00000005 access violation reading location 0x0000000000018"

1. I have checked that all inputs to the call are valid (pointers were allocated with fftw_malloc()) ,other inputs have legitimate  sizes and types.

2. Have run the application with Windows ApplicationVerifier attached to my IDE and got no warnings or errors.

3. Have run the application with Windows global flags attached to my IDE with all possible  heap corruption checks and got no exceptions.

What else can I do to debug these crashes?

0 Kudos
7 Replies
mecej4
Black Belt
190 Views

The part that you left out -- the "someaddress" in the MKL DLL -- is crucial. Please provide a full traceback, the values of the arguments to the MKL routines, and/or a reproducer if possible.

I suspect that an array element is being used, with an index of 1 or 7, but the pointer to the base of the array is zero.

kup__benny
Beginner
190 Views

mecej4 wrote:

The part that you left out -- the "someaddress" in the MKL DLL -- is crucial. Please provide a full traceback, the values of the arguments to the MKL routines, and/or a reproducer if possible.

I suspect that an array element is being used, with an index of 1 or 7, but the pointer to the base of the array is zero.

Example for typical crashes are:

Example #1 

Unhandled exception at 0x000007EF4C03069F (mkl_avx2.dll) in MyApp.exe 0xC0000005 Access violation reading location 0x0000000000000018

the call:

thePlan = fftw_plan_many_dft(2,TrDims,SameSizeFFTLength,xyw,TrDims,1,nEl,XY,TrDims,1,nEl,FFTW_FORWARD,FFTW_ESTIMATE);

where:

nEl = 8190;


TrDims[0] = 65;TrDims[1] = 126;//note that TrDims[0] * TrDims[1] = nEl !!!


SameSizeFFTLength = 2;

TrSize = nEl * sizeof(fftw_complex);

xyw  = (fftw_complex *)fftw_malloc(TrSize * SameSizeFFTLength);// initialize with relevant 
//values with a for loop

XY  = (fftw_complex *)fftw_malloc(TrSize * SameSizeFFTLength);

 

Example #2

Unhandled exception at 0x000007FE4C7E069F (mkl_avx2.dll) in MyApp.exe 0xC0000005 Access violation reading location 0x0000000000000018

the call:

thePlan = fftw_plan_many_dft(2,TrDims,SameSizeFFTLength,xyw,TrDims,1,nEl,XY,TrDims,1,nEl,FFTW_FORWARD,FFTW_ESTIMATE);

where:

nEl = 8190;


TrDims[0] = 65;TrDims[1] = 126;//note that TrDims[0] * TrDims[1] = nEl !!!


SameSizeFFTLength = 2;

TrSize = nEl * sizeof(fftw_complex);

xyw  = (fftw_complex *)fftw_malloc(TrSize * SameSizeFFTLength);// initialize with relevant 
//values with a for loop

XY  = (fftw_complex *)fftw_malloc(TrSize * SameSizeFFTLength);

 

Example #3

Unhandled exception at 0x000007FE429E3387 (mkl_avx2.dll) in MyApp.exe 0xC0000005 Access violation reading location 0x0000000000000000

the call:

thePlan = fftw_plan_many_dft(2,TrDims,SameSizeFFTLength,xyw,TrDims,1,nEl,XY,TRDims,1,nEl,FFTW_FORWARD,FFTW_ESTIMATE);

where:

nEl =576;


TrDims[0] =24;TrDims[1] = 24;note that TrDims[0] * TrDims[1] = nEl !!!

SameSizeFFTLength = 1670;

TrSize = nEl * sizeof(fftw_complex);

xyw  = (fftw_complex *)fftw_malloc(TrSize * SameSizeFFTLength);// initialize with relevant values with a for loop.

XY  = (fftw_complex *)fftw_malloc(TrSize * SameSizeFFTLength);

 

kup__benny
Beginner
190 Views

Any other references???

Gennady_F_Intel
Moderator
190 Views

Is that possible to give us the standalone C or F reproducer which we may compile with the latest versions of library and investigate the cause of the issue without an additional efforts? 

ROUSSEAU__Thomas
Beginner
190 Views

Hi everyone, 

Any improvment on this front ? 

I'm having the exact same error, on a different usecase. 

I have a Wave object (glorified wrapper to load a wav file into a vector<float>), and this function, whose goal is to find the best matching position between my object and a given sample. 

int Wave::find_best(vector<float> sample) {
    vector<complex<float>> output(data.size());
    sample.resize(data.size(), 0);

    DFTI_DESCRIPTOR_HANDLE fft = NULL;
    MKL_LONG status;
    status = DftiCreateDescriptor(&fft, DFTI_SINGLE, DFTI_COMPLEX, 1, data.size());
    cout << "1: " << status << endl;
    status = DftiSetValue(fft, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
    cout << "2: " << status << endl; 
    status = DftiCommitDescriptor(fft);
    cout << "3: " << status << endl; 
    status = DftiComputeForward(fft, sample.data(), output.data());
    cout << "4: " << status << endl; 
    status = DftiFreeDescriptor(&fft);
    cout << "5: " << status << endl;

I get a runtime error at the "DftiComputeForward" line.

Unhandled exception at 0x00007FFE47DF1756 (mkl_avx2.dll) in MyProgram.exe: 0xC0000005: Access violation reading location 0x0000027327E56000.

 

If any of you fine gentlemen have any idea, I'll gladly investigate.

Thank you in advance !

Gennady_F_Intel
Moderator
190 Views

MKL has fixed more then 7 issues related to FFT the last two versions ( 2019 and 2020).  Please check the problem with the current version of mkl or/and give us the reproduced which I could compile and execute on my end...

kup__benny
Beginner
190 Views

ROUSSEAU, Thomas wrote:

Hi everyone, 

Any improvment on this front ? 

I'm having the exact same error, on a different usecase. 

I have a Wave object (glorified wrapper to load a wav file into a vector<float>), and this function, whose goal is to find the best matching position between my object and a given sample. 

int Wave::find_best(vector<float> sample) {
    vector<complex<float>> output(data.size());
    sample.resize(data.size(), 0);

    DFTI_DESCRIPTOR_HANDLE fft = NULL;
    MKL_LONG status;
    status = DftiCreateDescriptor(&fft, DFTI_SINGLE, DFTI_COMPLEX, 1, data.size());
    cout << "1: " << status << endl;
    status = DftiSetValue(fft, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
    cout << "2: " << status << endl; 
    status = DftiCommitDescriptor(fft);
    cout << "3: " << status << endl; 
    status = DftiComputeForward(fft, sample.data(), output.data());
    cout << "4: " << status << endl; 
    status = DftiFreeDescriptor(&fft);
    cout << "5: " << status << endl;

I get a runtime error at the "DftiComputeForward" line.

Unhandled exception at 0x00007FFE47DF1756 (mkl_avx2.dll) in MyProgram.exe: 0xC0000005: Access violation reading location 0x0000027327E56000.

 

If any of you fine gentlemen have any idea, I'll gladly investigate.

Thank you in advance !

 

First, I have no bullet proof solution but concerning my problem I discovered that I had a bug using Intel's MKL library

in a completely different place from where the crashes happened ( it even was not in the same translation unit but it was in the same project/solution): the bug was that I used MKL cblas_dgemm(...) in-place - the same buffer used for both input and output since I multiplied NX1 vector with NXN matrix - that is forbidden and after I used a different buffer for the output all the crashes from mkl_avx2.dll disappered and my problem solved.

So my advice to you is that if you are sure that the code fragment above is correct check other usages of MKL functions in your code and make sure they are correct.