Probable reason for observed differences is threaded execution.
MKL User's Guide in section "Aligning Data for Numerical Stability" states the following:
With a given Intel MKL version, the outputs will be bit-for-bit identical provided all the following conditions are met:
Though the conditions are formulated as related to LAPACK and BLAS, they aregeneral.
Thank you for your response, and thank you to Dmitry for his response as well.
Yes, I'm well aware of the pitfalls of directly comparing floating-point values. However, in this case, it is the correct thing to do, because the whole point of this program is to verify whether results are bit-for-bit identical when the alignment of the input array is changed.
When you say the "code works fine", do you mean that the program did not print any strings like "Results at offset N differ from offset 0"? What compiler did you use? Which specific MKL libraries did you link with?
In my case, I'm using MS Visual Studio 2008. I see the problem when I link with:
mkl_intel_c_dll.lib, mkl_intel_thread_dll.lib, mkl_core_dll.lib, libiomp5md.lib
I still see the problem when I replace mkl_intel_thread_dll.lib with mkl_sequential_dll.lib.
I'm using MKL 10.2.2.025 on WinXP SP3 (32-bit). CPU == Intel Core2 Duo CPU T9400 @ 2.53GHz.
At runtime, if mkl_vml_p4m2.dll is NOT available (so presumably mkl_vml_def.dll is used) then the problem goes away. From this I conclude that the issues with data alignment probably come from the use of SSE or something similar.
Thanks for any insight you can provide!