Solved: truncation error in fortran compiler

仁义_许_ · ‎09-25-2022

      program main
      integer , parameter :: QP = Selected_real_kind( 18 )      
      real(kind=QP)::zb            
      zb=-(4.76_QP-4.9_QP)*0.14_QP            
      write(*,*)zb,0.14_QP*0.14_QP
      end

The result is

1.960000000000000000000000000000007E-0002
1.960000000000000000000000000000000E-0002

We should get the same results, but the reality is that the first one has a tail 7

I tried the code in

Parallel Studio XE Cluster Edition for Windows* 2020

and oneAPI

Same results are obtained.

In iteration computation, this error is enlarged more and more.

Arjen_Markus · ‎09-25-2022

You make a very common mistake: the fact that 4.9 - 4.76 is 0.14 does not mean that in the binary system used by most computers, 4.9-4.76, even in in quadruple precision, is exactly 0.14. This only happens if the numbers happen to be exactly representable in that system. Compare it to 1/3 * 3 - if you represent the 1/3 as 0.33333...3, then the outcome of that calculation will not be exactly 1.0. Of course, rounding may lead to a 1.000....0 being printed, but that is a different matter.

I extended your program by calculating the difference between -(4.76_qp-4.9_qp) and 0.14_qp. Intel Fortran oneAPI gave me:

  1.960000000000000000000000000000007E-0002
  1.960000000000000000000000000000000E-0002
  4.814824860968089632639944856462318E-0034

and gfortran gave me:

   1.95999999999999999838E-0002   1.96000000000000000008E-0002
  -1.21972744404619248826E-0019

(the last number is this difference)

I do not know exactly how quadruple precision is implemented in these two compilers, but it is possible that one or both use a software library.

If your algorithm is indeed sensitive to this sort of round-off errors, then I suggest you use an arbitrary-precision library instead.

View solution in original post

Arjen_Markus · ‎09-25-2022