Community
cancel
Showing results for
Did you mean:
New Contributor I
391 Views

## Double to Single

I have read the many posts that cover the topic of getting rid of the extra digits when converting a single to a double, and I think I understand the issues, but at the risk of releasing a firestorm of "this topic has already been done to death" replies can I just ask this simple question?

If I have Single = 1.4726, when I set Double = Single, I get Double = 1.472599983215332. I understand from previous posts that the extra digits are because the double precision bit pattern can't exactly represent 1.472600000000000 and 1.472599983215332 is the closest approximation. What I don't understand is if I set the Double directly to 1.4726d0 then I get exactly 1.472600000000000, which sort of contradicts the previous sentence. Can someone please explain this?

1 Solution
New Contributor I
136 Views

Thanks for all the replies, much appreciated!

20 Replies
Black Belt
382 Views

What it means is that 1.4726 cannot be represented exactly in binary to the precision of the single variable.  When you set it as double, then the extra bits in the double variable are set appropriately to give the closest possible value.  Copying the single variable to the double, means that all the remaining bits are zero, which is not good enough to get the accurate value.

New Contributor I
375 Views

And there's no simple way to get from Single = 1.4726 to Double = 1.472600000000000 without going via a string conversion?

Honored Contributor I
363 Views

First, please note yours is a general Fortran inquiry for which you may want to also consider the Fortran Discourse for wider Fortran community feedback: https://fortran-lang.discourse.group/

Secondly, can you please share from where do you get a value as '1.4726`?  Is that a calculation/simulation/experimental result stored in a file or database that is read in?   If so, please see simple-minded code below that "mimics" such an action:

``````   integer, parameter :: SP = selected_real_kind( p=6 )
integer, parameter :: DP = selected_real_kind( p=12 )
character(len=*), parameter :: fmtg = "(*(g0))"
character(len=*), parameter :: fmth = "(g0,z0)"
character(len=:), allocatable :: val
real(SP) :: val_sp
real(DP) :: val_dp
val = "1.4726" !<-- Assume this represents a file read or database fetch
print fmtg, "value: ", val_sp
print fmth, "value (single, hex): ", val_sp
print fmtg, "value: ", val_dp
print fmth, "value (double, hex): ", val_dp
end``````

Intel Fortran compiler toward a program would give:

``````C:\Temp>ifort /standard-semantics p.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.2.0 Build 20210228_000000

Microsoft (R) Incremental Linker Version 14.28.29337.0

-out:p.exe
-subsystem:console
p.obj

C:\Temp>p.exe
value: 1.472600
value (single, hex): 3FBC7E28
value: 1.472600000000000
value (double, hex): 3FF78FC504816F00

C:\Temp>``````

where you can see the difference in bit representation even as the "apparent" value seems to the same.  You can then see the type conversion you seek will result in misplaced bits.

Thus you are better off addressing the definition of your variables of different precision - "single" vs "double" - at the level of your "data" and thereby avoid type conversions.

New Contributor I
357 Views

Thanks. The 1.4726 was just a made-up value to illustrate the problem, but in the real application the singles are stored in a file that I now want to convert to doubles that have 0s in the extra significant digits.

Honored Contributor I
352 Views

Re: your comment, "the real application the singles are stored in a file that I now want to convert to doubles," please view what is stored in the file as your "data".  The point upthread is you can "read in" such data in the precision of interest.  You do not need to read them as default real (what you call "single") and then convert to "double".  Just read the values in your input IO statements to define objects are declared to be of higher precision (say "double") when that is the precision you plan to work with.

Black Belt
352 Views

You could try reading straight from the file into a double precision variable using an appropriate edit descriptor.  That should be more precise than reading as single precision first.

New Contributor I
277 Views

The default conversion does set the extra significant digits to zero--it's just the base 2 (or equivalenlty base 16=hexadecimal) digits that are set to zero, and not the base 10 ones. Setting the base 2 digits to zero in the conversion, rather than the base 10 ones, is the best thing to do, since numbers are internally represented in a base 2 system, so one obtains a (slightly) better approximation this way.

The only reason to favor base 10 arises if you know that the true number had all extra digits zero in base 10. I cannot think of many applications where this would be the case.

New Contributor I
346 Views

That sounds like a good solution, but doesn't it mean that it would read 8 bytes out of the file for each double rather than 4 bytes? My binary file contains 4-byte singles sequentially with no bytes in-between and if I read the file directly into 8-byte doubles then wouldn't it read two singles into one double?

Black Belt
343 Views
That's changing the story now. Not sure you said it was already a binary file.
New Contributor I
338 Views

I think I said "... in the real application the singles are stored in a file". Anyway it seems like there is no simple solution and so I will probably just convert them via a string, something like the following.

Single=1.4726
write(STR,*) Single

It seems to work. Can you see any issues with this approach?

Valued Contributor III
314 Views

Yes, the point is that you may think the number is 1.4726, but the bit pattern will say otherwise. So, when converted to a decimal representation with only 4 decimals, it may look exactly what you hope it is, but written out with 5 decimals, it could easily turn into 1.47259 instead of 1.47260. The point is, whatever the bit pattern in the file, the number it represents is at most the single-precision floating-point number closest to 1.47260000...

The only way out is to use a decimal representation of the number instead of a binary one. Because a decimal representation (such as BCD) will work in the way we humans are generally using arithmetic - in a decimal system. The situation is not different in binary representation than in decimal representation: in neither case you can exactly represent 1/3 with a finite number of "decimals". It is unfortunately, however, that the set of rational numbers that can be represented exactly in finite precision is much smaller in binary representation than it is in decimal representation.

Black Belt
299 Views

The OP has a conceptual problem, not recognizing that reals and doubles are represented and manipulated in a base 2 system (IEEE standard).

The following program may help overcome the mental block.

``````program hello
real*8 a,b
a = 0.1
b = 10*a - 1.0
print *,b
end program Hello``````

The printed answer need not be zero, depending on the computer and compiler used. I tried Gfortran on a cloud service, and it gave

``````\$gfortran -std=gnu *.f95 -o main
\$main
1.4901161193847656E-008``````

Some hand-held calculators implemented decimal arithmetic. There have been decimal arithmetic packages proposed for Fortran. The following can be used to test whether a processor (calculator or computer) is decimal or not.

``````   a = 4d0
b = 3d0
print *,3*(a/b-1)-1
end program``````

On a calculator, "(4/3-1)*3-1" or "4 Enter 3 / 1 - 3 * 1 -"

Beginner
286 Views

In fact, the old Microsoft 3.31 Fortran compiler from many years ago did all of it's math using subroutine libraries.  You could choose from

1)  Normal math library, assumed that math coprocessor installed.

2) Normal math library,  would work with coprocessor if available, but would still work if not.

3) Decimal math library.

New Contributor I
256 Views

I share a similar concern as @schulzey regarding Fortran code. To guarantee an exact representation of a number in double precision requires the programmer to append d0 or _dp if dp has been defined to be real(kind = 8). In other languages such as C/C++ or VB this is not required, where the default is double precision to begin with.

One example from chemical engineering applications is the case of atomic weights. The accepted atomic weight of C in our calculations is 12.01115. However, I have to assign it in Fortran as

AWC = 12.01115d0

to guarantee that it is represented to the same significant figures as above whereas in C++ I can simply code AWC = 12.01115

Similarly, if AWC is read from an ASCII file automatically generated by Excel where it is stored as 12.0115, then extra digits may appear after 8 significant digits as numerical noise ex. 12.0111500003254. This noise later propagates through thousands of calculations in large codes.

I have been recently using the IVF option /real-size:64 and that seems to not require the d0 or _dp. Further, when writing to files, I use write(*,'(g0)') AWC to ensure exact representation.

Black Belt Retired Employee
244 Views

@avinashs wrote:

I share a similar concern as @schulzey regarding Fortran code. To guarantee an exact representation of a number in double precision requires the programmer to append d0 or _dp if dp has been defined to be real(kind = 8). In other languages such as C/C++ or VB this is not required, where the default is double precision to begin with.

This is not true - even specifying the kind doesn't "guarantee an exact representation". Most decimal fractions are not exactly representable in binary floating point. You're only kidding yourself if you believe that double precision solves everything.

Honored Contributor I
238 Views
``````@avinashs wrote:
.. if AWC is read from an ASCII file automatically generated by Excel where it is stored as 12.0115, then extra digits may appear after 8 significant digits as numerical noise ex. 12.0111500003254. This noise later propagates through thousands of calculations in large codes. ..``````

What you posted does not appear to be accurate.  Can you please show what you mean while keeping the following in mind?

``````   integer, parameter :: SP = selected_real_kind( p=6 )
integer, parameter :: DP = selected_real_kind( p=12 )
character(len=*), parameter :: fmtg = "(*(g0))"
character(len=*), parameter :: fmth = "(g0,z0)"
integer :: lun
real(SP) :: val_sp
real(DP) :: val_dp
open( newunit=lun, file="atomic_mass.txt" )
print fmtg, "value: ", val_sp
print fmth, "value (single, hex): ", val_sp
rewind( lun )
print fmtg, "value: ", val_dp
print fmth, "value (double, hex): ", val_dp
end``````
``````C:\Temp>type atomic_mass.txt
12.01115

C:\Temp>ifort /standard-semantics p.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.2.0 Build 20210228_000000

Microsoft (R) Incremental Linker Version 14.28.29337.0

-out:p.exe
-subsystem:console
p.obj

C:\Temp>p.exe
value: 12.01115
value (single, hex): 41402DAC
value: 12.01115000000000
value (double, hex): 402805B573EAB368

C:\Temp>``````

New Contributor I
256 Views

I share a similar concern as @schulzey regarding Fortran code. To guarantee an exact representation of a number in double precision requires the programmer to append d0 or _dp if dp has been defined to be real(kind = 8). In other languages such as C/C++ or VB this is not required, where the default is double precision to begin with.

One example from chemical engineering applications is the case of atomic weights. The accepted atomic weight of C in our calculations is 12.01115. However, I have to assign it in Fortran as

AWC = 12.01115d0

to guarantee that it is represented to the same significant figures as above whereas in C++ I can simply code AWC = 12.01115

Similarly, if AWC is read from an ASCII file automatically generated by Excel where it is stored as 12.0115, then extra digits may appear after 8 significant digits as numerical noise ex. 12.0111500003254. This noise later propagates through thousands of calculations in large codes.

I have been recently using the IVF option /real-size:64 and that seems to not require the d0 or _dp. Further, when writing to files, I use write(*,'(g0)') AWC to ensure exact representation.

Black Belt
212 Views

Avinash, your statement "To guarantee an exact representation of a number in double precision requires the programmer to append d0" hints at the existence of a misunderstanding of binary floating point arithmetic. It may "require", but it is by no means sufficient. Many simple decimal numbers such as (1/10) = 0.1 do not have an exact representation in binary, just as 1/3 does not have an exact and short representation in decimal. Please try the following programs.

``````program tenth
real a,b
a = 0.1
b = 0.01
print *,a*a-b
end``````

and the double precision version

``````program tenth
real*8 a,b
a = 0.1d0
b = 0.01d0
print *,a*a-b
end``````
Black Belt
201 Views

>>If I have Single = 1.4726, when I set Double = Single, I get Double = 1.472599983215332.

FWIW The printout (or debugger view) of the SP variable is a decimal approximation of the (binary) internally stored variable. IOW the value in SP is approximately 1.4726

When copied from SP to DP, the 8-bit SP exponent is copied (0-extended) to the 11-bit DP exponent and the 23-bit SP mantissa (holding the approximate value mantissa of 1.4726) is copied to the 52-bit DP mantissa (0-filled in remainder).

The binary values are exactly the same approximation of 1.4726 as was held in the SP variable,... however when printed, you now see the difference in the approximation as was held in the SP variable.

Fixing the DP value to 1.4726d0 = (~1.472599983215332 + ~0.000000016784668) ...
Then should you copy this (fixed) DP value back to an SP variable, the "fixed" value (~0.000000016784668) would get truncated (rounded off) and lost. IOW the ~0.000000016784668 is the approximation of the error in the SP variable and not representative of an error in the DP copy of the SP variable.

This is a common mental block that (new) programmers experience in that they assume the value printed is an exact representation of the value of the stored variable (IOW an assumption that all variable have infinite decimal precision whereas the variables have finite binary precision). mecej4's 4/29 post was an alternate way of illustrating that the fractional precision between SP and DP can be visually significant when the fraction cannot be exactly represented in the SP binary mantissa.

If you want exact representation to 6 decimal places (e.g units of microns), then program in units of microns, not meters. But keep in mind that any generated fractional units could result in approximations that you would then have to determine how to handle.

Jim Dempsey

New Contributor I
137 Views

Thanks for all the replies, much appreciated!