SGESVD produces weird results for Intel MKL 2020 when using lwork from SGESVD

John_Young · ‎12-03-2020

Hi,

We are having issues with the SGESVD (we did not check D/C/ZGESVD) in Intel MKL 2020. I attached a test case that exhibits the problem. These were run using the linux version of Intel MKL. We have not checked the Windows version yet. The odd behavior occurs in Intel MKL 2020 (2020.0.4) but not the 2019 (2019.0.4) and 2018 (2018.0.3) version.

What we see is that when we perform the SVD of a matrix using the optimum lwork computed by SGESVD and then reconstruct the matrix using U*S*VH, the reconstructed matrix has the same Frobenius norm as the original matrix (to within about 5-6 digits), but the relative error between the original matrix and the reconstructed matrix is too high. With 2018 and 2019, the relative error is on the order 1E-10 but with 2020 the relative error is on the order of 1E-3. However, if we do the same operation using the minimum lwork computed from the formula in the documentation, then the relative errors are on the order of 1E-10 for 2018, 2019, and 2020. Also, for all three MKL versions the sum of the singular values are the same whether or not you use the optimum or minimum lwork. If you plot the absolute errors of each matrix entry, you can definitely see that the Intel MKL 2020 results using the optimum lwork has quite high error.

Also, we noticed that the optimum lwork for 2018 and 2019 228643 while for 2020 it was 107967. Although we don't know if that relates to the problem, we just note that something changed in the computation of the optimum lwork.

In the zip file is the source code of the test case and screen output from Intel MKL 2018, 2019, 2020 under the 2018, 2019, and 2020 directories, respectively. Also included in the subdirectories are the reconstructed matrices. In the main directory is a matlab script that plots the per-element absolute errors on a 2D plot for the different cases. The figures are also saved as the *.png files. The routine20.png exhibits the high error which you can by comparing to the other png files.

Also, the test case is a modification of the test case found in this post where they were also having some issues with the optimum lwork in 2020.

https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Potential-ZGESVD-Problem-with-LWORK/m-p/1224142#M30259

Thanks,

John

John_Young · ‎12-07-2020

Hi,

Has anyone at Intel had a chance to look at this? This issue is causing us lots of trouble with our production codes. We are working around it by using the formula value for lwork, but we prefer to use the optimal value.

Thanks,

John

Gennady_F_Intel · ‎12-08-2020

Hi John, thanks for the report. We will take a look at this case and keep this thread updated.

Gennady_F_Intel · ‎12-08-2020

please check this forum thread - https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/SVD-Single-Precision-Problem/m-p/1203215#M29936 and check the correctness of the computed outputs when the example has been linked against mkl 2020 with and without formula.

John_Young · ‎12-08-2020

Gennady,

Thank you for looking at this. If you look closely at the the test case I attached, you will see that already I did use the error check #1 you suggested in the thread you linked:

| A - U S VT | / ( |A| max(M,N) ) =< 300*ulp

The test case fails this error check when using MKL 2020 and the queried lwork value. It passes the error check using MKL 2020 with the minimum lwork value. In MKL 2019 and MKL 2018, the test passes using both the queried lwork and the minimum lwork. Here 300*ulp is3.58E-5. Here are the errors I see for

|A - U S VT | / ( |A| max(M,N) ) for MKL 2020

MKL MinimumLwork QueriedLwork

2018 4.14E-10 5.35E-10

2019 4.17E-10 6.93E-10

2020 4.17E-10 ** 1.43E-3 **

For #4, s(1)>= s(2) >= … s(min(M,N)) >=0, the singular values are monotonically decreasing in all cases. I've attached the singular values for 2018, 2019, and 2020.

I have not checked #2 and #3 yet but will get back to you.

(2) | I - U' U | / ( M ) =<const*ulp

(3) | I - VT' VT | / ( N ) =<const*ulp

However, based on #1, something changed in MKL 2020 for the SVD. The results are not correct even though singular values are correct. Even if further tests show #2 and #2 pass, the failure of #1 indicates a major issue in MKL 2020's SVD.

mecej4 · ‎12-08-2020

For comparison, here are the main results when the Netlib Lapack routines are used instead of MKL. I used Gfortran 10 on Cygwin64/Windows 64 and the Lapack that is available from Cygwin.

Computed LWORK = 2300

Singular values : 0.489273015409708E-02 0.205770158208907E-02 0.238681166844779E-11

The singular values were identical in all the digits displayed whether LWORK was provided per formula or as calculated by the routine.

Nathan_Champagne · ‎12-08-2020

I ran the program on a MacBook Pro using Intel Fortran (ifort (IFORT) 19.1.3.301 20200925) and MKL (Intel(R) Math Kernel Library Version 2020.0.4 Product Build 20200917). The first run is unmodified, linked with the MKL, and yields the following results (output_mac_sp.txt):

LWORK :      1047                      35239
Rel. error:   5.3592858E-10    1.4324166E-03
Result:         Passed                  Failed

Next, the program is linked with a custom-built library that contains just the SVD LAPACK files before being linked with the MKL. The results are (output_mac_sp_custom.txt) :

LWORK :      1047                       17002
Rel. error:   5.1187776E-10    5.1187776E-10
Result:          Passed     Passed

Finally, the value of LWORK=17002 from the previous case is used for the routine value, the source is linked with just the MLK, and the results are (output_mac_sp_17002.txt):

LWORK :      1047                       17002
Rel. error:   5.3592858E-10    5.3592858E-10
Result:         Passed   Passed

John_Young · ‎12-08-2020

Here are some more results for

|A - U S VT | / ( |A| max(M,N) )

| I - U' U | / ( M )

| I - VT' VT | / ( N )

I ran both the thin SVD (job='S') and the full svd (job='A'). Interestingly, the issue disappears in the full SVD and only occurs in the thin SVD. In all cases, the U'U and VT'VT errors are good. Since the singular values are always correct, this indicates to me that the U and VH are (sub-matrices of a ) unitary matrix as desired, but one (or both) is not the correct unitary matrix for the given original matrix.

Tolerance 300*ulp = 0.35763E-04
Lwork    FullThin Residual     U-Error     VH-Error
Minimum    Full 0.43915E-09 0.17805E-07 0.97107E-07
Minimum    Thin 0.41721E-09 0.15590E-06 0.10929E-06
Query      Full 0.42072E-09 0.17847E-07 0.98757E-07
Query      Thin 0.14324E-02 0.12217E-06 0.10486E-06

John_Young · ‎12-08-2020

Attached is the updated test case code that computes the additional U'U and VT'VT errors.

John_Young · ‎12-08-2020

Another interesting observation. The queried lwork which gives a poor result is 107967. In my test case, I first run the formula-computed lwork and the run the query-computed lwork. All results are below. For the formula-computed lwork branch, I tried setting specific values of lwork. It's interesting that when I set lwork explicitly, the queried lwork value of 107967 (or larger) is the exact value of lwork required to produce a poor result. However, setting values of lwork between 106455 and 107967 produces a good result for the reconstructed error but when you run the queried lwork case afterwards, the reconstructed error changes. For any lwork value less than 106455, all results are good.

This makes me think (assuming my test case does not have an issue) that the MKL SVD may be overwriting memory somewhere for certain values of lwork. There is no reason otherwise that the second SVD call should depend on what happened in the first SVD call.

For SetLwork<106455

SetLwork Reconstructed Error : 4.17E-10

QuriedLwork Reconstructed Error : 1.43E-3

For 106455 <= SetLwork < 107967

SetLwork Reconstructed Error : 4.17E-10

QuriedLwork Reconstructed Error : 3.02E-3

For SetLwork >= 107967

SetLwork Reconstructed Error : 1.43E-3

QuriedLwork Reconstructed Error : 1.43E-3

mecej4 · ‎12-10-2020

Just now, I downloaded and installed (the Windows version of) the new Classic compiler (Ifort), the new LLVM based compiler (ifx) and the new MKL library (download and installation took less than 30 minutes -- I had VS 2019 previously installed and running with Ifort 19.1U2).

With these new compilers and the new MKL, which they share, I no longer saw the failure that was reported in this thread. The value of LWORK computed by the routine is now 13163.

John_Young · ‎12-10-2020

Hi mecej4,

By the 'new' tools, do you mean the version 2021.1?

I will ask our cluster administrator to install these and see if the problem is resolved.

Thanks,

John

mecej4 · ‎12-10-2020

Yes, the download links are on this page:

https://software.intel.com/content/www/us/en/develop/tools/oneapi/all-toolkits.html

What I have gathered is that, starting with this release, the licensing model is changing. No longer does anyone need to purchase and manage licenses; older products will carry FlexLM license management as before.

Those who desire priority support from Intel may purchase support contracts.

Nathan_Champagne · ‎12-10-2020

I installed the latest oneAPI Base and HPC Toolkits on a MacBook Pro. All tests pass, now. The old and new results are attached.

John_Young · ‎12-14-2020

We installed the latest oneAPI libraries on our linux cluster and the issue seems to be resolved.

Thanks,

John

Gennady_F_Intel · ‎12-14-2020

thanks for the update. This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread.