Wrong triangular part of matrix accessed in function LAPACKE_ssygvx

Seidel__Jens · ‎05-08-2019

Hi all,

with the help of valgrind I noticed that LAPACKE_ssygvx calls LAPACKE_sge_nancheck on the input matrix.

I specified to LAPACKE_ssygvx to use only the lower triangular part of the symmetric matrix (the other part is not even initialized) but LAPACKE_sge_nancheck accesses also the upper traingular part and the algorithm returns an error if there is a NaN.

The attached program verifies this. I tested it with 2018.3.222 but also 2019.3.199, both version are affected.

I guessed MKLD-3999 (Fixed the issue LAPACKE_ssyevd fails when upper triangular part of the matrix is filled with random numbers) could be a fix but nope ...

Used compiler: g++ 8.1.0, Linux

valgrind output:

==28297== Conditional jump or move depends on uninitialised value(s)
==28297==    at 0x4023C7: LAPACKE_sge_nancheck (in mkl_bug)
==28297==    by 0x401FCA: LAPACKE_ssygvx (in mkl_bug)
==28297==    by 0x401AE1: main (in mkl_bug)

Gennady_F_Intel · ‎05-08-2019

This issue has to be fixed into latest 2019 u3 and someone from our customer confirmed this fix. Thanks for reproducer, we will check.

Gennady_F_Intel · ‎05-08-2019

]$ ./a.out
A=1 -0.1 0.5
B=0.7 -0.3 0.6
MKL_VERBOSE Intel(R) MKL 2019.0 Update 3 Product build 20190125 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions (Intel(R) AVX) enabled processors, Lnx 2.80GHz lp64 intel_thread
MKL_VERBOSE SSYGVX(1,V,I,L,2,0x246f080,2,0x246f0a0,2,0x7ffd3b506068,0x7ffd3b506070,2,2,0x7ffd3b506078,0,0x7ffd3b506280,0x7ffd3b506288,2,0x7ffd3b506178,-1,0x2480280,0x7ffd3b506290,0) 58.57us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:20
MKL_VERBOSE SSYGVX(1,V,I,L,2,0x246f080,2,0x246f0a0,2,0x7ffd3b506068,0x7ffd3b506070,2,2,0x7ffd3b506078,1,0x7ffd3b506280,0x7ffd3b506288,2,0x2481300,16,0x2480280,0x7ffd3b506290,0) 278.46us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:20
A=1.42857 0.57197 1.2684
B=0.83666 -0.358569 0.686607
Found eigenvalues: 1
Return: 0
w(lambda)=1.92603
z(x)=1.31147 0.955791
ifail=0

Gennady_F_Intel · ‎05-08-2019

I have no valgrind available for the moment to check exactly your case. Do you see the run tine problem into your application with the latest 2019 u3?

Seidel__Jens · ‎05-09-2019

Gennady F. (Blackbelt) wrote:
I have no valgrind available for the moment to check exactly your case. Do you see the run tine problem into your application with the latest 2019 u3?

I have seen the problem also with version 2019.03.199.

You can verify it if you activate the code

B[2] = std::numeric_limits<float>::signaling_NaN();

In this case nothing is done (A and B are unchanged, no eigenvalue compution is performed) and the return value is -9.

Gennady_F_Intel · ‎05-09-2019

$ ./a.out
A=1 -0.1 0.5
B=0.7 -0.3 0.6
A=1 -0.1 0.5
B=0.7 -0.3 0.6
Found eigenvalues: 0
Return: -9
a.out: mkl_bug.cpp:60: int main(): Assertion `m == 1' failed.

Gennady_F_Intel · ‎05-09-2019

do you see the similar result?

Seidel__Jens · ‎05-09-2019

Gennady F. (Blackbelt) wrote:
do you see the similar result?

I get the same result. You see that changing an entry (B[2] or A[2]) in the right, top triangel changes the result, but I told the function to use the lower part only ("L").

Sarah_K_Intel · ‎05-09-2019

Thank you very much for your reproducer and thorough testing. You're right - this is an issue and will be fixed in an upcoming release of Intel MKL. In the meantime, if this is hampering your (or others') development, NaN checking can be disabled by setting the LAPACKE_NANCHECK environment variable to 0 or by calling the LAPACKE_set_nancheck function.

Gennady_F_Intel · ‎12-24-2019

>mkl_bug.exe
A=1 -0.1 0.5
B=0.7 -0.3 0.6
MKL_VERBOSE Intel(R) MKL 2020.0 Product build 20191125 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Win 2.60GHz cdecl intel_thread
MKL_VERBOSE SSYGVX(1,V,I,L,2,000001ACACF52660,2,000001ACACF52980,2,0000008FD24FF670,0000008FD24FF678,2,2,0000008FD24FF690,0,0000008FD24FF840,0000008FD24FF848,2,0000008FD24FF700,-1,000001ACACF57680,0000008FD24FF8 13.03us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:2
MKL_VERBOSE SSYGVX(1,V,I,L,2,000001ACACF52660,2,000001ACACF52980,2,0000008FD24FF670,0000008FD24FF678,2,2,0000008FD24FF690,1,0000008FD24FF840,0000008FD24FF848,2,000001ACACF79580,16,000001ACACF57680,0000008FD24FF8 266.67us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:2
A=1.42857 0.57197 1.2684
B=0.83666 -0.358569 0.686607
Found eigenvalues: 1

Gennady_F_Intel · ‎12-24-2019

Hello!

The fix of the issue available into the newest MKL v.2020. You could take this version and check the problem on your side.