I haven't found this info in documentation, that's why I'm asking here.
Does gemm operation guarantee that the same call (same input data/sizes) return absolutely same result when only OMP_NUM_THREADS differs between launches? (OMP_NUM_THREADS=1,2,...)
Is it possible to get different results between sequential and non-sequential versions (inputs are absolutely the same)?
Do these rules apply to older MKL versions? (2018/19/20)
Thanks for reaching out to us.
>>Is it possible to get different results between sequential and non-sequential versions (inputs are absolutely the same)?
We have tried executing a sample code using gemm operation (dgemm) and observed that the results are the same in both sequential and non-sequential versions.
The results are unaffected even while launching OMP_NUM_THREADS with different values like what you have mentioned in your query (OMP_NUM_THREADS=1,2,3..).
Below is the link from where we have taken the sample code.
If you have observed any differences in results, please do let us know. (provide us with a sample reproducer)
>>Do these rules apply to older MKL versions? (2018/19/20)
We tested the same code with older versions as well but the results are unchanged.
Could you please let us know your environment details (OS &version, MKL version, Compiler)?
The answer by @VidyalathaB_Intel is not correct. There can be differences between sequential and multi-threaded versions. If you need bitwise reproducible results, please refer to the so called CNR mode present in oneMKL:
and strict CNR mode (which is what you need I believe, "strict" refers to the dependence on the number of threads)
So it turns out what you are asking about is what we in MKL call "conditional numerical reproducibility (CNR)", see https://software.intel.com/content/www/us/en/develop/articles/introduction-to-the-conditional-numeri.... It is not guaranteed in MKL for all optimized code paths, however there is an environment variable or API you can set/use in your application code that can enable it (essentially taking different code paths designed for this feature) and which can lead to results being consistent from run to run for a fixed number of threads being used. Documentation is here: https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/...
Additionally, there is a "Strict Conditional Numerical Reproducibility" mode for a subset of MKL APIs which allow for the number of threads to be changed and still get the exact same results. See https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/... for more details.
Thanks for the confirmation.
As the issue is resolved we are closing this thread. Please post a new question if you need any additional information from Intel as this thread will no longer be monitored.