CSYTRF/CSYTRS give different results 1 thread vs 8 threads (MKL 2019u5)

AndrewC · ‎11-13-2019

THREADS=8
first 10 values of the Solution
           1 (-104034.0,20268.75)
           2 (-104099.4,20310.57)
           3 (-103879.3,20264.78)
           4 (-103980.5,20282.24)
           5 (-104128.0,20316.93)
           6 (-103976.7,20282.51)
           7 (-104120.5,20318.15)
           8 (-103958.9,20275.50)
           9 (-104034.3,20268.56)
          10 (-104085.2,20310.74)

THREADS=1

first 10 values of the Solution
           1 (-104393.1,20376.32)
           2 (-104459.2,20418.33)
           3 (-104238.2,20372.31)
           4 (-104339.8,20389.88)
           5 (-104488.0,20424.71)
           6 (-104336.1,20390.15)
           7 (-104480.5,20425.96)
           8 (-104318.0,20383.11)
           9 (-104393.3,20376.11)
          10 (-104445.1,20418.51)

This is being tested with MKL 2019.5, Visual Studio 2017 64-bit compiler.

The test matrix is in the .zip file and needs to be unzipped into the directory of the executable.

Gennady_F_Intel · ‎11-13-2019

thanks, Andrew for the report, we will check it asap.

Gennady_F_Intel · ‎11-14-2019

yes, I see the problem with the latest 2019 u5 and we will escalate the issue.

AndrewC · ‎11-14-2019

Thanks!

AndrewC · ‎11-26-2019

Hi Gennady,

Was this confirmed as an issue, and is there going to be a fix at some point?

Andrew

Gennady_F_Intel · ‎11-26-2019

Andrew,

we could not confirm that the reported case is the bug for Intel MKL. The MKL LAPACK routines cannot guarantee bitwise reproducible results even in the strict CNR mode (see the KB article at https://software.intel.com/en-us/articles/introduction-to-the-conditional-numerical-reproducibility-cnr). Since that and because of unavoidable round-off errors and different order of arithmetic operations in sequential and parallel code branches, the deviation in solutions observed by the user should be expected.

Gennady

AndrewC · ‎12-05-2019

Well... I realize that we not should expect bit-for-bit identical results with threads=1, vs threads=8, but the differences are more significant that I would expect. I do not see differences in other MKL routines of a similar magnitude ( 3rd or 4th significant figure) when varying the number of threads.