SVD changes makes least squares implementation much more fragile

Tormod_Landet · ‎03-06-2013

I am using the linear least squares in MKL through Numpy and found that a program I have been using for quite some time is now misbehaving. I first reported the bug to Numpy ( https://github.com/numpy/numpy/issues/3037 ), but I have since been able to reproduce it in Fortran as well when calling MKL's implementation of DGELSD directly, so it seems something has changed in the implementation somewhere between MKL 10.2.1.017, which works great for this case, and the version that comes with ifort 2013.1.117, which has the above mentioned problems.

The DGELSY function still gives OK results, it is only DGELSD (and computation of condition number through SVD) that has changed. See the very end of the Numpy bug thread for a test case with results (also copied in below). Test code for Lapack77 and Lapack95 is attached. The same code will show significantly different results depending on the version of MKL that is dynamically linked in (and depending on using DGELSD or DGELSY which are easy to swap in the Fortran 95 version),

It may be that the problem I am solving should be conditioned before sending it to Lapack, but this used to work fine so I am wondering if this is a bug (you can increase the loop for bigger N in the program until you exhaust memory without getting problems on the previous version of MKL)

Thanks!

Tormod Landet

-------------------------------------------------------------------------

Example "good" and "bad" output:

Expected output:

N=   100   X=     -0.487E+03      0.534E+04      0.161E+01
N=   200   X=     -0.487E+03      0.534E+04      0.161E+01
N=   400   X=     -0.487E+03      0.534E+04      0.161E+01
N=   800   X=     -0.487E+03      0.534E+04      0.161E+01
N= 1600   X=     -0.487E+03      0.534E+04      0.161E+01
N= 3200   X=     -0.487E+03      0.534E+04      0.161E+01
N= 6400   X=     -0.487E+03      0.534E+04      0.161E+01
N= 12800   X=     -0.487E+03      0.534E+04      0.161E+01

New MKL version output:

N=   100   X=       -487.000       5341.100          1.612
N=   200   X=       -487.000       5341.100          1.612
N=   400   X=       -487.000       5341.100          1.612
N=   800   X=20543386197.418   -1287329.552    3867252.969
N= 1600   X=83981033310.235   -1305315.529    3921269.417
N= 3200   X=***************   -1314349.970    3948387.615
N= 6400   X=***************   -1318877.412    3961973.691
N= 12800   X=***************   -1321143.670    3968773.407

Zhang_Z_Intel · ‎03-06-2013

Thanks for providing detailed description of the problem and test cases. I'll try to reproduce it and do some investigation before getting back to you.

Zhang_Z_Intel · ‎03-11-2013

Our investigation indicates that this was a bug in MKL 11.0.1, which is the version that you're using. It has been fixed in MKL 11.0.2 (the latest release). If you update your Intel Fortran Composer, or just update the MKL component, then the problem should be gone. Please let me know if this helps.

Thanks!