MKL gives different results on the same machine

sercan · ‎10-02-2022

I am using Intel MKL 2022's dpotrf for Cholesky factorization. I get different results when I run the same program multiple times on the same machine. Is this expected? My program uses a single thread. Dynamic adjustment of the number of threads is disabled. CNR branch is set to AUTO. The log is as below:

MKL_VERBOSE oneMKL 2022.0 Update 1 Product build 20220311 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.10GHz lp64 intel_thread

MKL_VERBOSE DPOTRF(U,3,0x197fa10,3,0) 2.06us CNR:AUTO Dyn:0 FastMM:1 TID:0 NThr:1

I don't observe this reproducibility issue if I set the CNR branch to AVX2 or COMPATIBLE in the code. I also don't observe the issue with MKL 11.3. The log with MKL 11.3 on the same machine is below:

MKL_VERBOSE Intel(R) MKL 11.3 Update 3 Product build 20160413 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 2.10GHz lp64 intel_thread NMICDev:0
MKL_VERBOSE DPOTRF(U,3,0x1f1a640,3,0) 8.64us CNR:AUTO Dyn:0 FastMM:1 TID:0 NThr:1 WDiv:HOST:+0.000

What could be causing the lack of reproducibility?

VidyalathaB_Intel · ‎10-03-2022

Hi Sercan,

Thanks for reaching out to us.

Could you please provide us with the sample reproducer code and steps to reproduce the issue (commands to compile & run) so that we can do a quick check from our end as well?

Please let us know your OS environment details and the results that you are getting along with the correct results.

Meanwhile, you can give it a try with examples that comes with MKL installation and see if you observed the same issue.

Regards,

Vidya.

sercan · ‎10-07-2022

Apologies, we have discovered that this issue is caused by a very minor difference in the input matrix we provide to dpotrf on different runs. Surprisingly, the input difference arises only when using the AVX-512 code branch. We have identified a solution on our end.

Thank you for your help.

VidyalathaB_Intel · ‎10-07-2022

Hi Sercan,

>>this issue is caused by a very minor difference in the input matrix we provide to dpotrf on different runs...We have identified a solution on our end.

Glad to know that your issue is resolved and thanks for letting us know.

Please post a new question if you need any additional assistance from Intel as this thread will no longer be monitored.

Regards,

Vidya.