I am having an issue with bit-for-bit identical results from the Fortran intrinsic gamma function.
In a source file that is identical, but used in two different versions of the model, calls are made to the Fortran intrinsic gamma function. In both cases, the source file is compiled without optimization (-O0 ). The input to the gamma function is bit-for-bit identical (checked by calculating an Adler checksum of the value to make sure the representation in memory is really correct). The output is slightly different:
X = 0.80000000000000000000000000000000000E+01
CHKSUM(X) = 2539951972
OUTPUT ONE: 0.50400000000000000000000000000000000E+04
OUTPUT TWO: 0.50399999999999927240423858165740967E+04
The compiler flags, apart from -O0, specify -fp-model source. Should I use -fp-model precise or consistent? I would like to obtain the value from OUTPUT ONE, since this is the value in the reference codebase.
Is there a possibility to find out why the results are different? Vectorization reports? Assembler output?
Also, in case two different implementations of the gamma function are used, is there a possibility to control which version?
Any insight is highly welcome.
Update: using -fp-model precise does not solve the issue, -fp-model consistent does.
I am overriding a few (default) optimization flags with -O0, is there any way to check which optimizations are used in the end? Adding "-qopt-report=5" doesn't say anything (because of -O0), but I am not sure about other flags such as -xCORE-AVX2. Do they get disabled if -O0 is used?
The output is formatted output, because I know that this truncates the accuracy I calculate the checksum of the argument X using an Adler 32-bit hash algorithm. I can see that the output is really different, not just the formatted version, because the results of my code do change afterwards.
What I found thus far is that the differences go away when I add -fimf-arch-consistency=true (or -fp-model consistent, this switch contains -fimf-arch-consistency=true and a few more). But this also changes the results of the other calculations in the file. Ideally I only wanted the gamma function to produce the correct result in both codes.
Here are, I hope, a couple of stronger reasons to prefer OUTPUT ONE:
Since Γ(8) = 7! = 50400, the result is an exact integer, and that integer has an exact representation in IEEE-32 and IEEE-64 floating point formats. The results do agree to 15 decimal digits, but in this case we may reasonably expect 16 digits instead.
It may be useful for you to output the result in hexadecimal format, e.g., 1.0d0 is represented as 3FF0 0000 0000 0000. Doing so will let you know whether the internal representation is itself inaccurate or if the formatting in decimal representation introduced additional deviations.
I doubt that any change should be seen in the output result as a consequence of your selecting different compiler options that control FPU operations. The calculation of the gamma function is performed in the Fortran library that contains that intrinsic function. Unless your /fp options cause a different library to be used or change the control word (or MXCSR), there should be no change in the value of Γ(8) that is returned by the intrinsic function.
I can confirm that (a) the input to the gamma function is bit-for-bit identical and that (b) adding the flag -fimf-arch-consistency=true solves the bit-for-bit differences. There are articles out there that discuss that different versions of mathematical functions can get used, depending on which math library and compiler flags like the above. In my case, the bit-for-bit differences also disappear if I use static linking instead of dynamic linking to get my new/alternative code into the model.
Thanks Steve, I did know about some other references from Intel regarding bit-for-bit reproducibility, but not about your slides. Don't know if there is a way to flag this thread as "solved", but from my side it is because I can use "-fimf-arch-consistency=true" to get identical results.