Converting single to double - clearing the garbage

ferrad01 · ‎11-15-2011

Is there a function to clean out the garbage after the significant part of the single when converting to double?

eg. if I have

real*4 :: sing
real*8 :: doub
sing = 0.279759
doub = dble(sing)
write(6,*) doub

then 0.279758989810944 is printed out

I could multiply by 1e7, take int() and divide again, but the power of ten (7) is not always consistent.

I also tried writing to a character sttring first:
read(5,*) sing
write(rch,'(g20.10)') sing
write(6,'(a)') 'rch: ' // rch

But that seems to convert to a double before writing.

TimP · ‎11-15-2011

By "converting to double" do you mean something like
doub = 0.279759d0
or
integer, parameter :: dk = selected_real_kind(12)
...
doub =0.279759_dk
?

ferrad01 · ‎11-15-2011

I mean I have a real*4 number eg. 0.279759

I need it as a double.

However if I use the dble function I get 0.279758989810944

I want to get 0.2797590000000 into the double.

See my code example above.

In my real appliaction I don't have the single number explicitly in the code as above, I read it from a file.

mecej4 · ‎11-15-2011

Are you overlooking the fact that your computer is going to convert the input decimal number into internal floating point (IEEE) format? The number that you have, 0.279759, is not exactly representable as a 32-bit or 64-bit real. Therefore, when you promote from single to double precision, you are not going to have padding by 0's on the right.

jimdempseyatthecove · ‎11-15-2011

real*4 cannot contain exactly 0.279759 as this decimalnumber contains a binary fraction that exceeds the precision of the fraction of a real*4 number. This is to say, the number you place into the real*4 variable will be a rounded approximation of the decimal number 0.279759 and in this case, the roundoff is observed in the 8th decimal place. When converted to real*8, the error in precision is observed.

If you want, you can write the REAL*4 number to a character*nnvariable, including any rounding/truncation that you feel appropriate, then reading from the character*nninto the real*8 variable.

ferrad01 · ‎11-15-2011

Yes I know the reason.

I know that the code doub = dble(sing) isn't going to do the trick.

I was just wondering if there is any function or if there are any coding suggestions to blank out the garbage.

ferrad01 · ‎11-15-2011

I did try the latter suggestion.

However as I mentioned in my original post:

I also tried writing to a character string first:
read(5,*) sing
write(rch,'(e20.10)') sing
write(6,'(a)') 'rch: ' // rch

But that seems to convert sing to a double before writing.

jimdempseyatthecove · ‎11-15-2011

Use:

write(rch,'(e20.7)') sing

24-bits of mantissa yields 7 digits (plus a fraction) of precision.

Jim Dempsey

Jeffrey_A_Intel · ‎11-16-2011

THERE IS NO GARBAGE!

doub=dble(sing)

"expands" the format of the single-precision value in sing by appending the correct number of zeros (29 in this case) to the significand, adjusts the exponent field and stores it as a double-precision value. The conversion treats the value in sing as exact; it has no idea that it's supposed to be an approximation to the decimal value 0.279759.

The so-called garbage you're seeing is the result of the fact that single-precision value closest to 0.279759 is (close to) 0.279758989810944.

There is no single-precision or double-precision value which is exactly equal to 0.279759.

If you need the double-precision value closest to 0.279759, you must convert that decimal string directly to double precision (e.g., by doing a formatted read into a double-precision variable). You can't get that value by going through a single-precision value first.

ferrad01 · ‎11-16-2011

This works some of the time.

Jeffrey_A_Intel · ‎11-16-2011

If it doesn't work as I have described all of the time, then there's a bug to be found and fixed, either in the compiler, the processor or your application.

ferrad01 · ‎11-16-2011

If I read a number (0.279759) from a datafile into a single precision variable, I see 0.279759. That is the number that the user wants. Now to use it in a double precision calculation they want to use 0.2797590000000, not 0.279758989810944.

I undertstand why this happens, but it still introduces errors in our calculations!

ferrad01 · ‎11-16-2011

This was a reply to Jim Dempsey's post. I replied to yours after this.

jimdempseyatthecove · ‎11-16-2011

Jeff,

sing = 0.279759

Unless 0.279759 has an exact binary fractional value to the precision of the number of bits in the mantissa (24)of a single precision fp variable then sing does not contain exactly "0.279759" therefor dble(sing) is require .NOT. to return a value that is other than that stored in sing (0.279758989810944).

The program may also contain

bing = 0.279758989810944

and where IF(sing .eq. bing) would report .TRUE.
Well I should say may report .TRUE. since the compiler input conversion might round the literal on the bing= statement and affect the lsb.

sdoub=dble(sing)
bdoub=dble(bing)

Would yield the same result (0.279758989810944)

Jim Dempsey

Jeffrey_A_Intel · ‎11-16-2011

I don't know how to say it any other way: there is no IEEE binary floating-point single-precision or double-precision value which is exactly equal to 0.2797590000000.

The single-precision floating-point bit pattern 0x3e8f3c92 is approximately equal to 0.2797589898... No other single-precision floating-point bit pattern has a value which is closer to 0.279759 than this one. But notice, it is not exactly equal to 0.279759.

When you convert 0x3e8f3c92 to double-precision, the double-precision bit pattern is 3fd1e79240000000. That bit pattern is still approximately equal to 0.27975898989... Its value is identical to the single-precision value. The only difference between the two bit patterns is that you've added 29 zeros to the end of the significand. Adding more zeros at the right doesn't change the value.

The reason you see the value 0.279758989810944 is because that is a very close approximation to the value stored in the variable. It is a better approximation of the variable's value than is 0.279759.

Now, if you read 0.279759 into a double-precision variable, you'll get the bit pattern 0x3fd1e7924af0bf1a which has the value 0.2795789999... This value is obviously much closer to 0.279759 than is the single-precision value (0.2797589898...) but neither of them is exactly equal to 0.279759.

ferrad01 · ‎11-16-2011

I agree with all you say. I understand how the binary conversions and representations work.

But it doesn't address the point that when a user enters 0.279759 they want to see 0.279759000000 used in the calculations not 0.279758989810944.

I have written simple code which multiplies by 10^7, takes the int(x+0.5) then divides by 10^7 again and this works most of the time. Sometimes the factor is 10^6 and I have code set up to establish when this should be used. So it is possible to write code to do what the user wants, that's all I am looking for.

I have tested with 1000 random floats in all of the decades from 1e-10 to 1e+10 (21000 values) and 99% get translated correctly with my formula. The last 1% are proving a little tricky.

Can you give a code example of what you mean in your last statement?

Jeffrey_A_Intel · ‎11-16-2011

Compile and run this. Does it make clearer what's actually happening?

[fortran]PROGRAM foo
    IMPLICIT NONE
    REAL*4 :: a
    REAL*8 :: b
    CHARACTER*132 :: x
    x = '0.279759'
!   Read 0.279759 into a REAL*4 variable
    READ( x, '(F9.6)' ) a
!   Display the result in various formats
    WRITE( *, '(F15.7,F23.15,Z16)' ) a, a, a
!   Convert the value to REAL*8 and display
    b = REAL( a, KIND( 1.0D0 ) )
    WRITE( *, '(F15.7,F23.15,Z24)' ) b, b, b
!   Read 0.279759 into a REAL*8 variable and display
    READ( x, '(F9.6)' ) b
    WRITE( *, '(F15.7,F23.15,Z24)' ) b, b, b
END PROGRAM foo
[/fortran]

ferrad01 · ‎11-16-2011

This example is easy to follow as the number 0.279759 is hardcoded and thereforre allows you to use a fixed format of f9.6.

However in general the number can be any value between the limits of a single precision real, so exponential format is needed. But should that be e16.6 or e16.7? I found that of the 21,000 random numbers I generated between 1e-10 and 1e+10, 50% needed e16.6 to work the rest needed e16.7.

anthonyrichards · ‎11-17-2011

I think the message is that, even if you put

x= 0.2797590000000

in the Fortran code, x will be represented internally as your so-called 'garbage' number which is the closest that a binary pattern can get to it.

If you want to use 0.2797590000000 EXACTLY in a calculation, you will have to write your own function to first store the EXACT character string "0.2797590000000" and any other string of numbers you want to use and write your own functions to 'read' the strings as numbers and then to to multiply, divide them, or whatever using the algorithms taut to you in school. Eventually you will have to return the results to the computer's internally stored binary format if you want to use the results in 'normal' calculations and at the same time you will then have accept the rounding limitations that result and which have been mentioned exhaustively above.

jimdempseyatthecove · ‎11-17-2011

>>This example is easy to follow as the number 0.279759 is hardcoded and thereforre allows you to use a fixed format of f9.6.

There are no format edit descriptors (F9.6) for stored real numbers. Stored real numbers are stored in a binary representation. Format editdescriptors are used for converting internal binary representation into text format with implied rounding.

>>However in general the number can be any value between the limits of a single precision real

Not so. Rather:

In general the number can be any descrete value* between the limits of a single precision real.

*descrete value:

+ or -, 0.0, Not-A-Number or
+ or -, (1 + (0 or 1)/2 + (0 or 1)/4 + (0 or 1)/8 ... (0 or 1)/(2^^24))^^n

where n is +/- power of 2 (IOW a left shift or right shift amount with a bias of 127)

You have a 23-bit binary fraction (with an implied 1 bit as the 24th most significant bit)

Example: 0.1 (decimal fraction) converted to binary

For exact representation would require an infinately long series of bits, not 24 bits (1 followed by 23 bits) containing

0.101101101101101101101101101101...
0.1bbbbbbbbbbbbbbbbbbbbbbbxxxxxx...

where the msb bit above (first 1)is hidden in the assumed most significant bit, represented by the '1' in the second line, the stored significant bits of the 0.1 binary fraction represented by bbbbbbbbbbbbbbbbbbbbbbb and the lost binary fraction bits represented by xxxxxx...

** however, rounding may force the stored number to be stored asthe 3rd following line

0.101101101101101101101101101101...
0.1bbbbbbbbbbbbbbbbbbbbbbbxxxxxx...
0.101101101101101101101110

Therefore the stored number for 0.1 decimal is slightly larger than the actual number (assuming round-up).
Other decimal fractions, that require more bits than available, will be rounded up or down as the case may be.

Converting from single precision to double precision will be an exact conversion (of the rounded number).

Jim Dempsey