Output results slightly different if I'm using a intermediary value and optimisation O0,O1,O2, O3

Guillaume_De_Nayer · ‎02-28-2011

Hi,

We have a fluid solver written in Fortran. I'm trying to optimise it.

In a "do loop" I have:
ue=(u(ine)*fxe+u(inp)*fxw)+
& dtime*denr*vole*arx*
& ((dpcv(ine)*fxe+dpcv(inp)*fxw)-dpe)
ve=(v(ine)*fxe+v(inp)*fxw)+
& dtime*denr*vole*ary*
& ((dpcv(ine)*fxe+dpcv(inp)*fxw)-dpe)
we=(w(ine)*fxe+w(inp)*fxw)+
& dtime*denr*vole*arz*
& ((dpcv(ine)*fxe+dpcv(inp)*fxw)-dpe)

we can see that
"dtime*denr*vole*((dpcv(ine)*fxe+dpcv(inp)*fxw)-dpe)" is computed 3 times. So I decided to replace these lignes with:
real*8 val_tmp
val_tmp=dtime*denr*vole*((dpcv(ine)*fxe+dpcv(inp)*fxw)-dpe)
ue=(u(ine)*fxe+u(inp)*fxw)+
& arx*val_tmp
ve=(v(ine)*fxe+v(inp)*fxw)+
& ary*val_tmp
we=(w(ine)*fxe+w(inp)*fxw)+
& arz*val_tmp

Then I compare the output results in the first case with the ones in the second case. With O0, O1, O2 or O3 the results are slightly different. I don't understand why ?!?

Best regards,
Guillaume

mecej4 · ‎02-28-2011

Floating point arithmetic is not exact. There exists a small positive number \epsilon such for positive x < \epsilon that the floating point unit of the CPU is unable to distinguish 1 + x from 1. Read, for example,

http://introcs.cs.princeton.edu/91float/

and the references cited there.

Intel architecture IA32 and Intel64/AMD64 CPUs contain 80-bit x87 registers as well as 64-bit SSE2 registers. When the result of a calculation is stored is stored from an 80-bit register to 64-bit memory and reloaded later, the newly loaded value is slightly different from the original value in the register.

There are compiler options to control floating point behavior. There is no magical set of options to give you what you expect.

Sometimes, one needs to ask what "slightly different" means, since it is possible to print out numbers that only appear to be different as a result of conversion from the internal binary representation to decimal scientific notation.

Guillaume_De_Nayer · ‎02-28-2011

Thx for your answer. So there is no simple solution. :(

TimP · ‎02-28-2011

Compilers don't usually look for variations in evaluation order which will expose common sub-expressions. To do so could be time-consuming. If you would enclose the common expressions in parentheses (and, with ifort, set -assume protect_parens), you should get the same result which you found with your temporary specified.