Fortran single precision problem

Duan__Chenglong · ‎01-15-2018

Here is a small piece of code.

program main
implicit none

real :: a=0.01,b

b=1.0/(12.0*a**2)

write(*,*) c

end program

The analytical value for b should be 833.333333... But when I use gfortran to compile this code, it gives me 833.333374. I know I can use double precision, but I won't be allowed because we deal with large dataset in an order of GB magnitude each time. To save the computation cost, single precision is the only choice. But why in this case the result is wrong from the fifth after the float point. In C there's no such problem. I'm confused and seeking a way to solve this problem... Thank you.

mecej4 · ‎01-16-2018

Single precision REALs occupy 32 bits per element, of which 23+1 are part of the "significand" or "mantissa". It follows that such numbers have log10(24) ≈ 7 decimal digits. Therefore, you have no basis for quibbling about differences in the eighth significant digit: 833.333374. Significant digits are always numbered starting from the leftmost nonzero digit.

C has a convention of promoting 32-bit real variables to 64-bit doubles when they are used in expressions, and C compilers have options to prevent this from happening, if that is desired. Fortran follows slightly different conventions, and you can exploit the rules of mixed-mode Fortran expressions.

Please note that "But why in this case the result is wrong from the fifth after the float point" displays a misunderstanding of "floating point". When the numbers are written, the decimal point appears to float. Inside the machine, the floating point does not float; it does not even exist explicitly, since its position is always the same and so it need not be represented.

jimdempseyatthecove · ‎01-16-2018

Single precision floating point provides for 6 to 9 significant digits (depending on number), you show 7 significant digits (one could say for 833.333374: 6 and 8/10ths significant digits). The precision is for digits of the complete number, and not just those after the decimal point.

The default floating point in C is double. You must have forgotten to specify float precision:

int _tmain(int argc, _TCHAR* argv[])
{
 float a = 0.01f;
 float b;
 b = 1.0f / (12.0f*a*a);
 printf("%12.6f\n", b);
 return 0;
}

  833.333374

Jim Dempsey