Solved: Re: Double to Single

schulzey · ‎04-28-2021

I have read the many posts that cover the topic of getting rid of the extra digits when converting a single to a double, and I think I understand the issues, but at the risk of releasing a firestorm of "this topic has already been done to death" replies can I just ask this simple question?

If I have Single = 1.4726, when I set Double = Single, I get Double = 1.472599983215332. I understand from previous posts that the extra digits are because the double precision bit pattern can't exactly represent 1.472600000000000 and 1.472599983215332 is the closest approximation. What I don't understand is if I set the Double directly to 1.4726d0 then I get exactly 1.472600000000000, which sort of contradicts the previous sentence. Can someone please explain this?

schulzey · ‎05-02-2021

Thanks for all the replies, much appreciated!

View solution in original post

DavidWhite · ‎04-28-2021

What it means is that 1.4726 cannot be represented exactly in binary to the precision of the single variable. When you set it as double, then the extra bits in the double variable are set appropriately to give the closest possible value. Copying the single variable to the double, means that all the remaining bits are zero, which is not good enough to get the accurate value.

schulzey · ‎04-28-2021

And there's no simple way to get from Single = 1.4726 to Double = 1.472600000000000 without going via a string conversion?

FortranFan · ‎04-28-2021

@schulzey ,

First, please note yours is a general Fortran inquiry for which you may want to also consider the Fortran Discourse for wider Fortran community feedback: https://fortran-lang.discourse.group/

Secondly, can you please share from where do you get a value as '1.4726`? Is that a calculation/simulation/experimental result stored in a file or database that is read in? If so, please see simple-minded code below that "mimics" such an action:

   integer, parameter :: SP = selected_real_kind( p=6 )
   integer, parameter :: DP = selected_real_kind( p=12 )
   character(len=*), parameter :: fmtg = "(*(g0))"
   character(len=*), parameter :: fmth = "(g0,z0)"
   character(len=:), allocatable :: val
   real(SP) :: val_sp
   real(DP) :: val_dp
   val = "1.4726" !<-- Assume this represents a file read or database fetch
   read( val, fmt=* ) val_sp
   print fmtg, "value: ", val_sp
   print fmth, "value (single, hex): ", val_sp
   read( val, fmt=* ) val_dp
   print fmtg, "value: ", val_dp
   print fmth, "value (double, hex): ", val_dp
end

Intel Fortran compiler toward a program would give:

C:\Temp>ifort /standard-semantics p.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.2.0 Build 20210228_000000
Copyright (C) 1985-2021 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.28.29337.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:p.exe
-subsystem:console
p.obj

C:\Temp>p.exe
value: 1.472600
value (single, hex): 3FBC7E28
value: 1.472600000000000
value (double, hex): 3FF78FC504816F00

C:\Temp>

where you can see the difference in bit representation even as the "apparent" value seems to the same. You can then see the type conversion you seek will result in misplaced bits.

Thus you are better off addressing the definition of your variables of different precision - "single" vs "double" - at the level of your "data" and thereby avoid type conversions.

schulzey · ‎04-28-2021

Thanks. The 1.4726 was just a made-up value to illustrate the problem, but in the real application the singles are stored in a file that I now want to convert to doubles that have 0s in the extra significant digits.

FortranFan · ‎04-28-2021

@schulzey ,

Re: your comment, "the real application the singles are stored in a file that I now want to convert to doubles," please view what is stored in the file as your "data". The point upthread is you can "read in" such data in the precision of interest. You do not need to read them as default real (what you call "single") and then convert to "double". Just read the values in your input IO statements to define objects are declared to be of higher precision (say "double") when that is the precision you plan to work with.

DavidWhite · ‎04-28-2021

You could try reading straight from the file into a double precision variable using an appropriate edit descriptor. That should be more precise than reading as single precision first.

Ulrich_M_ · ‎04-29-2021

The default conversion does set the extra significant digits to zero--it's just the base 2 (or equivalenlty base 16=hexadecimal) digits that are set to zero, and not the base 10 ones. Setting the base 2 digits to zero in the conversion, rather than the base 10 ones, is the best thing to do, since numbers are internally represented in a base 2 system, so one obtains a (slightly) better approximation this way.

The only reason to favor base 10 arises if you know that the true number had all extra digits zero in base 10. I cannot think of many applications where this would be the case.

schulzey · ‎04-28-2021

That sounds like a good solution, but doesn't it mean that it would read 8 bytes out of the file for each double rather than 4 bytes? My binary file contains 4-byte singles sequentially with no bytes in-between and if I read the file directly into 8-byte doubles then wouldn't it read two singles into one double?

DavidWhite · ‎04-28-2021

That's changing the story now. Not sure you said it was already a binary file.

schulzey · ‎04-28-2021

I think I said "... in the real application the singles are stored in a file". Anyway it seems like there is no simple solution and so I will probably just convert them via a string, something like the following.

Single=1.4726
write(STR,*) Single
read(STR,*) Double

It seems to work. Can you see any issues with this approach?

Arjen_Markus · ‎04-28-2021

Yes, the point is that you may think the number is 1.4726, but the bit pattern will say otherwise. So, when converted to a decimal representation with only 4 decimals, it may look exactly what you hope it is, but written out with 5 decimals, it could easily turn into 1.47259 instead of 1.47260. The point is, whatever the bit pattern in the file, the number it represents is at most the single-precision floating-point number closest to 1.47260000...

The only way out is to use a decimal representation of the number instead of a binary one. Because a decimal representation (such as BCD) will work in the way we humans are generally using arithmetic - in a decimal system. The situation is not different in binary representation than in decimal representation: in neither case you can exactly represent 1/3 with a finite number of "decimals". It is unfortunately, however, that the set of rational numbers that can be represented exactly in finite precision is much smaller in binary representation than it is in decimal representation.

mecej4 · ‎04-29-2021

The OP has a conceptual problem, not recognizing that reals and doubles are represented and manipulated in a base 2 system (IEEE standard).

The following program may help overcome the mental block.

program hello
   real*8 a,b
   a = 0.1
   b = 10*a - 1.0
   print *,b
end program Hello

The printed answer need not be zero, depending on the computer and compiler used. I tried Gfortran on a cloud service, and it gave

$gfortran -std=gnu *.f95 -o main
$main
   1.4901161193847656E-008

Some hand-held calculators implemented decimal arithmetic. There have been decimal arithmetic packages proposed for Fortran. The following can be used to test whether a processor (calculator or computer) is decimal or not.

   a = 4d0
   b = 3d0
   print *,3*(a/b-1)-1
end program

On a calculator, "(4/3-1)*3-1" or "4 Enter 3 / 1 - 3 * 1 -"

cryptogram · ‎04-29-2021

In fact, the old Microsoft 3.31 Fortran compiler from many years ago did all of it's math using subroutine libraries. You could choose from

1) Normal math library, assumed that math coprocessor installed.

2) Normal math library, would work with coprocessor if available, but would still work if not.

3) Decimal math library.

avinashs · ‎04-29-2021

I share a similar concern as @schulzey regarding Fortran code. To guarantee an exact representation of a number in double precision requires the programmer to append d0 or _dp if dp has been defined to be real(kind = 8). In other languages such as C/C++ or VB this is not required, where the default is double precision to begin with.

One example from chemical engineering applications is the case of atomic weights. The accepted atomic weight of C in our calculations is 12.01115. However, I have to assign it in Fortran as

AWC = 12.01115d0

to guarantee that it is represented to the same significant figures as above whereas in C++ I can simply code AWC = 12.01115

Similarly, if AWC is read from an ASCII file automatically generated by Excel where it is stored as 12.0115, then extra digits may appear after 8 significant digits as numerical noise ex. 12.0111500003254. This noise later propagates through thousands of calculations in large codes.

I have been recently using the IVF option /real-size:64 and that seems to not require the d0 or _dp. Further, when writing to files, I use write(*,'(g0)') AWC to ensure exact representation.

Steve_Lionel · ‎04-29-2021

@avinashs wrote:

I share a similar concern as @schulzey regarding Fortran code. To guarantee an exact representation of a number in double precision requires the programmer to append d0 or _dp if dp has been defined to be real(kind = 8). In other languages such as C/C++ or VB this is not required, where the default is double precision to begin with.

This is not true - even specifying the kind doesn't "guarantee an exact representation". Most decimal fractions are not exactly representable in binary floating point. You're only kidding yourself if you believe that double precision solves everything.

FortranFan · ‎04-29-2021

@avinashs wrote:
.. if AWC is read from an ASCII file automatically generated by Excel where it is stored as 12.0115, then extra digits may appear after 8 significant digits as numerical noise ex. 12.0111500003254. This noise later propagates through thousands of calculations in large codes. ..

@avinashs ,

What you posted does not appear to be accurate. Can you please show what you mean while keeping the following in mind?

   integer, parameter :: SP = selected_real_kind( p=6 )
   integer, parameter :: DP = selected_real_kind( p=12 )
   character(len=*), parameter :: fmtg = "(*(g0))"
   character(len=*), parameter :: fmth = "(g0,z0)"
   integer :: lun
   real(SP) :: val_sp
   real(DP) :: val_dp
   open( newunit=lun, file="atomic_mass.txt" )
   read( lun, fmt=* ) val_sp
   print fmtg, "value: ", val_sp
   print fmth, "value (single, hex): ", val_sp
   rewind( lun )
   read( lun, fmt=* ) val_dp
   print fmtg, "value: ", val_dp
   print fmth, "value (double, hex): ", val_dp
end

C:\Temp>type atomic_mass.txt
12.01115

C:\Temp>ifort /standard-semantics p.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.2.0 Build 20210228_000000
Copyright (C) 1985-2021 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.28.29337.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:p.exe
-subsystem:console
p.obj

C:\Temp>p.exe
value: 12.01115
value (single, hex): 41402DAC
value: 12.01115000000000
value (double, hex): 402805B573EAB368

C:\Temp>

avinashs · ‎04-29-2021

I share a similar concern as @schulzey regarding Fortran code. To guarantee an exact representation of a number in double precision requires the programmer to append d0 or _dp if dp has been defined to be real(kind = 8). In other languages such as C/C++ or VB this is not required, where the default is double precision to begin with.

One example from chemical engineering applications is the case of atomic weights. The accepted atomic weight of C in our calculations is 12.01115. However, I have to assign it in Fortran as

AWC = 12.01115d0

to guarantee that it is represented to the same significant figures as above whereas in C++ I can simply code AWC = 12.01115

Similarly, if AWC is read from an ASCII file automatically generated by Excel where it is stored as 12.0115, then extra digits may appear after 8 significant digits as numerical noise ex. 12.0111500003254. This noise later propagates through thousands of calculations in large codes.

I have been recently using the IVF option /real-size:64 and that seems to not require the d0 or _dp. Further, when writing to files, I use write(*,'(g0)') AWC to ensure exact representation.

mecej4 · ‎04-30-2021

Avinash, your statement "To guarantee an exact representation of a number in double precision requires the programmer to append d0" hints at the existence of a misunderstanding of binary floating point arithmetic. It may "require", but it is by no means sufficient. Many simple decimal numbers such as (1/10) = 0.1 do not have an exact representation in binary, just as 1/3 does not have an exact and short representation in decimal. Please try the following programs.

program tenth
real a,b
a = 0.1
b = 0.01
print *,a*a-b
end

and the double precision version

program tenth
real*8 a,b
a = 0.1d0
b = 0.01d0
print *,a*a-b
end

jimdempseyatthecove · ‎04-30-2021

>>If I have Single = 1.4726, when I set Double = Single, I get Double = 1.472599983215332.

FWIW The printout (or debugger view) of the SP variable is a decimal approximation of the (binary) internally stored variable. IOW the value in SP is approximately 1.4726

When copied from SP to DP, the 8-bit SP exponent is copied (0-extended) to the 11-bit DP exponent and the 23-bit SP mantissa (holding the approximate value mantissa of 1.4726) is copied to the 52-bit DP mantissa (0-filled in remainder).

The binary values are exactly the same approximation of 1.4726 as was held in the SP variable,... however when printed, you now see the difference in the approximation as was held in the SP variable.

Fixing the DP value to 1.4726d0 = (~1.472599983215332 + ~0.000000016784668) ...
Then should you copy this (fixed) DP value back to an SP variable, the "fixed" value (~0.000000016784668) would get truncated (rounded off) and lost. IOW the ~0.000000016784668 is the approximation of the error in the SP variable and not representative of an error in the DP copy of the SP variable.

This is a common mental block that (new) programmers experience in that they assume the value printed is an exact representation of the value of the stored variable (IOW an assumption that all variable have infinite decimal precision whereas the variables have finite binary precision). mecej4's 4/29 post was an alternate way of illustrating that the fractional precision between SP and DP can be visually significant when the fraction cannot be exactly represented in the SP binary mantissa.

If you want exact representation to 6 decimal places (e.g units of microns), then program in units of microns, not meters. But keep in mind that any generated fractional units could result in approximations that you would then have to determine how to handle.

Jim Dempsey

schulzey · ‎05-02-2021

Thanks for all the replies, much appreciated!