precision error/optimization

ukhoray · ‎04-10-2010

I am developing a code for computing chaotic particle trajectories where roundoff errors could lead to serious troubles. Currently I am running Intel Fortran 11.066 for Mac OS X. I discovered that the roundoff error changes depending on optimization level, which to me means that I cannot trust any of the results. I was able to reproduce this strange behavior using the following simplified code:

program test
implicit none
integer,parameter :: dp =kind(1.0D0)

real(dp) :: z,x,y,a

z=105.86983582559437877534946892410516740_dp
x=0.99999599929369431539072365922038443_dp
y=0.99975552864635852667873905375017785_dp

a=z*x
a=a*y
write(*,'(f40.35)')z*x*y
write(*,'(f40.35)')a
end program test

After compiling it with:
ifort -O0 test.f90
and running a.out, the print out is:
105.84353023294737283777067204937338830
105.84353023294738704862538725137710570

On the other hand if I compile it with:
ifort -O1 test.f90
I obtain:
105.84353023294737283777067204937338830
105.84353023294738704862538725137710570

If in the description of the numerical constants x,y,z I drop _dp, or use syntax x=real(....,kind=dp) I get:
105.84352722123841772372543346136808400
105.84352722123841772372543346136808400

The real code, of course, is much more complex and I was not able to remove inconsistency between different level of optimization by simply changing how variables/constants are described. It seems to me that higher optimization levels (-O2, -O3) are working incorrectly (integrals of motion are not conserved).

I would very much appreciate any advice of how to interpret the above inconsistencies and fix the problem.

Thanks a lot!
-Sasha

Tim_Gallagher · ‎04-10-2010

Higher optimization levels use faster math routines to save time, but they cut accuracy.

Check

http://www.intel.com/software/products/compilers/docs/flin/main_for/fpops/common/fpops_fp_arch.htm

for options to give during compilation. You can force it to use precise math routines while still allowing other the other optimizations.

Tim

eliosh · ‎04-11-2010

Double precision gives you about 15-16 decimal digits. As far as I can see the results are consistent withing this limit.

The question is what is really printed when you ask, say, 40 digits ?

jimdempseyatthecove · ‎04-11-2010

eliosh,

The processor has two methods for performing floating point instructions. Old FPU (8087) and newer SSE.
The instruction sequence

a=z*x

a=a*y

Provides for the program optimization to remember the register values for a in the first statement and re-use the value in the calculation of the second statement (as well as to only store a once in the second statement).

(for douubles) The older FPU has an 80-bit internal data, and temporary registers. The SSE has 64-bits internal date and registers. Therefore the above sequence produces different precision dependent on which floating point instructions are used. The programmer (for now) has a choic,e by compiler options, as to which floating point instruction set to use.

Jim Dempsey

TimP · ‎04-11-2010

What happens when you ask for the same order of evaluation in each case?
ifort -assume protect_parens ...

a=z*x
a=a*y
write(*,'(f40.35)')(z*x)*y

If this doesn't succeed, you could argue there's a bug.
When you don't specify the order, you can expect differences in the lowest order bit.

You could argue that protect_parens should be a default, but I guess there haven't been many customer requests for it. So I set that among the options in ifort.cfg.

-fp-model source may have an effect of left-to-right association without parentheses, but I would prefer the parentheses if this is critical for your application.

eliosh · ‎04-11-2010

Thank you Jim,

This is what I also thought. However, one can always ask to print 40 digits in the answer. Waht will be printed in case of SSE?

Thank you.

P.S.
By the way, I compiled this code and here on my machine I always get the same result
105.84353023294738704862538725137710570

disregarding the -Ox (x=0,1,2,3) flag.

It is 64 bil linux with Ifort v 11.1.69

Quick glance at the assembly code shows that SSE instructions are used and not X87.

TimP · ‎04-11-2010

If the WRITE formatting complies with IEEE754, standard 64-bit double precision will have 17 significant digits shown in decimal. A change up to the 17th digit should reflect a change in the binary data. Beyond that, the library could have made any choice. It could be an exact conversion from a binary value extended with zero bits (same as for x87 up to 21 signficant digits), it could shift from that method to zeros at any point, or it could be something else.

jimdempseyatthecove · ‎04-11-2010

The print function(s) code selection (FPU/SEE) is not affected by choice of compile time options. The variables printed will have had to pass through memory (IOW - get fixed at 64-bits). The extra bits will be of significance through intermeadiary calculations.

That is to say the 64-bit result contained in variable a, after both satements might differ. But depending on values chosen, the results may be the same.

Jim

eliosh · ‎04-12-2010

Thanks Tim,

I think your explanation pretty much answers the original question as well. Namely, the differences observed after the 17 digit are irrelevant since they are in fact garbage.

It is also worth to mention that saving numbers like

z=105.86983582559437877534946892410516740_dp

is pretty much meaningless since double precision cannot keep all those digits.

jimdempseyatthecove · ‎04-12-2010

If those extra digits are valid (i.e. came from system capable of REAL(16) or REAL(bigger than 8)) then keep those digits in thesource file. The reason being, should you ever port this back to system supporting larger formats the extra precision is present.

When dealing with some of the scientific database files from JPL the data can be obtained from systems supporting larger than REAL(8). In some cases the software uses two REAL(8) to gain additional precison, in other cases the extended precision is obtained by systems supporting longerFP "word" size.

Now, should105.86983582559437877534946892410516740_dp have been originally generated on your system then clip/round it to 17 or 18 places.

Jim Dempsey