DBLE() behavor -- Is this expected?

stevenchan · ‎02-07-2012

Hello:

I was tracking down a map projection error the other day and came across a very undesirable behavor of DBLE(). Let me explain.

When given different 'seed' values, DBLE() throws in different 'junk' digits. What are the rules behind the generation of these digits? Why does DBLE() overshoot with one seed value (2.7183) but undershoot with another seed value (3.1416)?

Is there any way to enforce DBLE(x) to produce x as one entered as a REAL*8 constant? For example,

DBLE(3.1416) = 3.14160000000000

Steven

---

PROGRAM TEST

IMPLICIT NONE

REAL(4) :: xS

REAL(8) :: xD

xS = 3.1416

PRINT *, 'REAL*4 (xS) = ', xS

xD = 3.1416D0

PRINT *, 'REAL*8 entered as a constant (xD) = ', xD

PRINT *, 'REAL*8 converted from REAL*4 using DBLE(xS) = ', DBLE(xS)

PRINT *, 'Difference between converted REAL*8 and REAL*4 = ', DBLE(xS)-xD

xS = 2.7183

PRINT *, 'REAL*4 (xS) = ', xS

xD = 2.7183D0

PRINT *, 'REAL*8 entered as a constant (xD) = ', xD

PRINT *, 'REAL*8 converted from REAL*4 using DBLE(xS) = ', DBLE(xS)

PRINT *, 'Difference between converted REAL*8 and REAL*4 = ', DBLE(xS)-xD

END

---

REAL*4 (xS) = 3.141600

REAL*8 entered as a constant (xD) = 3.14160000000000

REAL*8 converted from REAL*4 using DBLE(xS) = 3.14159989356995

Difference between converted REAL*8 and REAL*4 = -1.064300536590679E-007

REAL*4 (xS) = 2.718300

REAL*8 entered as a constant (xD) = 2.71830000000000

REAL*8 converted from REAL*4 using DBLE(xS) = 2.71830010414124

Difference between converted REAL*8 and REAL*4 = 1.041412351909798E-007

Steven_L_Intel1 · ‎02-07-2012

There are no "junk digits". DBLE converts the single-precision argument to double precision by adding binary zeroes to the fraction field. It doesn't know what decimal number you used originally. If you want a constant interpreted as REAL(8), maintaining the additional precision, add _8 (or D0) at the end.

stevenchan · ‎02-07-2012

Steve:

Thank you for your reply.

What I wanted is not a constant to be interpreted as REAL(8). Rather, I want the output of DBLE() to produce a clean stream of zeros, something like:

DBLE(3.1416) = 3.141600000 ...

Is this possible?

Steven

mecej4 · ‎02-08-2012

Your expectations indicate an inadequate understanding of binary floating point representations.

> I want the output of DBLE() to produce a clean stream of zeros..

That would be in violation of the Fortran language rules. Even if you could find a compiler that showed that kind of behavior, that compiler would be kaput.

Your "clean stream of zeros" in decimal floating point becomes a "filthy" (?) stream of 0 and 1 bits in binary. The 32-bit and 64-bit IEEE representations of 3.1416 are Z'40490FF9' and Z'400921FF2E48E8A7', neither of which contains a long stream of trailing zeros.

What is more, neither is an exact representation of 3.1416, because such a representation is impossible for this particular number.

If you wait a few years in the case of Fortran, or switch to another language, you may find Decimal Floating Point to be more widely implemented and closer to what you ask for. See

IEEE Standard for Floating-Point Arithmetic

Jeffrey_A_Intel · ‎02-08-2012

This was discussed to death late last year. See http://software.intel.com/en-us/forums/showthread.php?t=101169. As mecej4 says, there is no single- or double-precision binary floating-point number which is equal to 3.141600000...