Difference in computed result with avx-512 compared to computed with avx2

Joe_the_dev · ‎12-03-2021

I am working on a computation intensive application which uses Intel mkl library for the heavy lifting.
We discovered that the unit tests compute slightly different results when run on a processor with AVX-512 extensions compared to when the test suite runs on a machine which only has AVX2, enough difference to trip over the accepted error bound.

On one machine the application loads:

Intel\mkl_avx512.dll (2019.0.5.1)
Intel\mkl_vml_avx512.dll (2019.0.5.1)

On another machine it loads:
Intel\mkl_avx2.dll (2019.0.5.1)
Intel\mkl_vml_avx2.dll (2019.0.5.1)

In both cases the OS is Windows 10 64 bit.
I know that by setting the environment variable MKL_ENABLE_INSTRUCTIONS:
MKL_ENABLE_INSTRUCTIONS=AVX2
you can restrict the usage of the instruction extension set to AVX2.
And this removes the difference in computed results.

But we would like to understand what is causing the difference in computation.
Are there instructions in AVX-512 which give different values compared to the
AVX2 counter part ?
Is fma implemented / used differently in the ... _512.dll's compared to the ... _avx2.dll's for version 2019.0.5.1 on Windows 10 64 bit ?

Gennady_F_Intel · ‎12-03-2021

we would recommend checking the mkl developer guide as well as the knowledge base articles follow the links:

https://www.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/top/obtaining-numerically-reproducible-results/get-started-with-conditional-num-reproducibility.html

https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-the-conditional-numerical-reproducibility-cnr.html

Call mkl_cbwr_set(MKL_CBWR_AVX2) or Set the environment variable: export MKL_CBWR = AVX2

will allow you to see the same results from run to run on avx2 and avx-512 based systems when the #of threads would be the same.

-Gennady

Joe_the_dev · ‎12-06-2021

Thanks, yes I read those and set MKL_CBWR:
MKL_CBWR = AVX2,STRICT
But that loads the avx2 dll's as well,
so the same as with:
MKL_ENABLE_INSTRUCTIONS=AVX2.
But is it not possible to get a better matching result, but computed with the avx512 dll's ?

Gennady_F_Intel · ‎12-06-2021

>> But is it not possible to get a better matching result, but computed with the avx512 dll's ?

no, it is not possible.

Gennady_F_Intel · ‎12-12-2021

This issue is closed and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.