The following code using exponentiation function returns slightly different result depending on which ifort version is used:
program powtest real z(100) real c(100) real d(100) real zi real ci integer i zi = 0.1 ci = 0.009 z = zi c = ci do i = 1 , 100 d(i) = c(i) ** z(i) enddo print *, d(50) end
ifort version 14.0.1 result = 0.6243444
ifort version 18.0.5 result = 0.6243445
I tracked it down to be coming from the vectorized power function (svml_powf4) giving different answer for different package.
I understand that is is coming from different optimization inside svml. Is there a way to force exactly the same results for different svml versions maybe at the expense of performance? Also, can anything be said about the accuracy of the two results?
It's easy to run the experiment of using double precision to see what you get. If I do this I get 0.624344436399716 . At first blush you might say this is closer to the older result, but you're doing this in single precision and worrying about the seventh decimal position, already straining the limits of single.
When I build and run this program with /O3 /QxHost I get a last digit of 4, not 5 (in 19.0.2), and SVML is being used. What options are you using to compile?
Thank you for your pointers. I was compiling with -O1 ,switching to -O3 does not make any difference but with -XHost (or -avx )whatever
gets the __svml_powf8 to be used I get the same result as you.
So I think that this is coming specifically from __svml_powf4.
Setting -fimf-precision=high and I see __svml_powf4_ha is used. This gives me the same result across different svml versions.
Are you possibly aware of similar switches to control the behavior of MKL? (i.e. get consistent results across different MKL versions).
A few pointers that I have found mention ways to restrict MKL code branches:
that seems to give reproducible results of the same version across different runs or across different architectures but it is not clear
if reproducibility across versions can be achieved.
I suggest you compare the hexadecimal output instead of a generic output. The difference you see may be due to a change in the float to text conversion as opposed to or in addition to different results from __svml_powf4
I have tried to compare the binary values:
but the outputs are still different:
answer = 1.603834
answer = 1.603835
Greetings Tim P,
Thank you for the MKL reproducibilty pointers. According to the document:
The CNR mode of Intel MKL ensures bitwise reproducible results from run to run of Intel MKL functions on a fixed number of threads for a specific Intel instruction set architecture (ISA) under the following conditions:
- Calls to Intel MKL occur in a single executable
- The number of computational threads used by the library does not change in the run
It is not very clear but my approximate sketch on how it works is as follows:
CNR mode when enabled it switches on two different types of reproducibility:
1) Reproducibility from run to run mainly by:
A) dealing with data alignment with respect to vectorization differently. always choose unaligned versions of instructions maybe?
B) Setting thread related parameters like deterministic reductions or static scheduling.
2) Reproducubility when it runs under processors that expose different extended instructions sets.
This is done by restricting the code path to user setting. Looks like it uses automatic cpu dispatching internally.
Apart from those, can anything be said about the reproducibility across different MKL versions? Also does the -fp-model or any other ifort flag affect the choice of MKL functions (like in the svml case where setting -fimf-precision=high gets a different function version to be used)