Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Program crash in mkl_avx.dll on Windows with Intel MKL 11.0

Travis_O_
초급자
4,439 조회수

One of our customers has encountered a problem with our library (NumPy) when linked against MKL version 11.0.3 on the Windows platform.    The program dies during an eigenvalue decomposition.    The problem does not show up on a build of NumPy against an older version of the MKL.

Attached are some screenshots which provide information about: 

1) Where the crash occurs (mkl_avx)

2) Which instruction it seems not to like (vandpd)

3) call stack

4&5) Configuration information (Windows Version and Hardware).   

The software is Anaconda 1.5 available here:  www.continuum.io  but the same problem was replicated with other versions of NumPy downloaded elsewhere.   Several machines with this kind of hardware seem to have the same problem. 

0 포인트
13 응답
mecej4
명예로운 기여자 III
4,439 조회수

I think versions of Windows older than W7SP1/Windows Server 2008 R2 did not support AVX instructions. Does this pertain to your customer's installation?

0 포인트
SergeyKostrov
소중한 기여자 II
4,439 조회수
This is simply to let you know that the 2nd screenshot ( Screen Shot 2013-05-23 at 4.39.54 PM.png ) has Size = 0.
0 포인트
SergeyKostrov
소중한 기여자 II
4,439 조회수
>>...I think versions of Windows older than W7SP1/Windows Server 2008 R2 did not support AVX instructions... On a couple of screenshots there is information about a version of Windows and this is: Windows 7 Professional SP1 ( 6.1.7601 ) I also see that Travis's computer has lots of memory and I don't think this is a memory related problem. It looks like a problem with MKL.
0 포인트
Bernard
소중한 기여자 I
4,439 조회수

If you are using already windbg please run automated analysis with command !analyze -v or !analyze -hang and post the result.

0 포인트
Bernard
소중한 기여자 I
4,438 조회수

Error code points to Access violation exception and it seems that wrong pointer was dereferenced or wrong memory address was calculated.

0 포인트
SergeyKostrov
소중한 기여자 II
4,439 조회수
Unfortunately, it is Not clear what MKL function was called... Since the user is Not following up it is possible that the problem is resolved.
0 포인트
Travis_O_
초급자
4,439 조회수

The problem is not resolved.  I don't have access to the machine directly so am waiting to see if I can get additional output from windbg.   Also, I don't know what this means exactly "run automated analysis with command !analyze -v or !analyze -hang and post the result"  I am not an expert windbg user.     

Our only option is to disable AVX instructions, unfortunately. 

0 포인트
Travis_O_
초급자
4,439 조회수

The problem is not resolved.  I don't have access to the machine directly so am waiting to see if I can get additional output from windbg.   Also, I don't know what this means exactly "run automated analysis with command !analyze -v or !analyze -hang and post the result"  I am not an expert windbg user.     

Our only option is to disable AVX instructions, unfortunately. 

0 포인트
Travis_O_
초급자
4,439 조회수

Doesn't this screen-shot show the stack of MKL functions that was called:  http://software.intel.com/sites/default/files/forum/392958/screen-shot-2013-05-23-at-4.40.48-pm.png

0 포인트
Shane_S_Intel
직원
4,439 조회수

Hi Travis - I'll see if I can find someone on the MKL side to do a deeper analysis.

-Shane

0 포인트
Alexander_K_Intel3
4,439 조회수

Hello,

With very high confidence this is the same issue as described in http://software.intel.com/en-us/articles/svd-multithreading-bug-in-mkl. It goes to the same root cause according to the stack trace provided. A higher-level function (dlange) generates out of bound array reference and uses it to call AVX optimized subfunction, which crashes due to out of memory access. There is nothing specific to AVX. This is 6 cored machine, and this is the number of cores on which the issue is known to appear.

Please recommend the customer to upgrade to MKL 11.0.4 where the issue is already fixed. A temporal workaround is to use 4 threads by setting environment variable MKL_NUM_THREADS=4.

W.B.R.,

Alexander

0 포인트
SergeyKostrov
소중한 기여자 II
4,439 조회수
>>...Doesn't this screen-shot show the stack of MKL functions that was called: >>http://software.intel.com/sites/default/files/forum/392958/screen-shot-2. You're right and I missed it even if I've looked at that screenshot several times. Thanks to Alexander for additional technical details!
0 포인트
Travis_O_
초급자
4,439 조회수

Thank you Alexander.   I think you are exactly right and this is the same bug.   In fact, I can verify that the problem goes away for the user when they install the software compiled with MKL 11.0 Update 4

Thanks..

-Travis

0 포인트
응답