Solved: Why does Version 19 change how cube root is calculated?

Michael_D_11 · ‎12-20-2018

I just upgrade to to compiler version 19.0.0.117 from 18.0.3.210. In reviewing some of my codes I noticed small discrepancies in the results. Tracing these discrepancies led to the evaluation of the cube root function. In version 18, cube root (exponentiation to 1/3) is handled by cbrtf. In version 19 this changes to powf. This appears to be a reversal of what happened between versions 11 and 12 (see: https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/279705). This simple code snippet can be used to observe the change:

real b

real ans_b
b= 1500.

ans_b= b**(1./3.)

In version 18 the result is 11.4471426

In version 19 the result is 11.4471435

Any reason for this change? Especially since the version 18 implementation seems to give the more accurate answer.

JVanB · ‎12-21-2018

Ronald W Green (Intel) wrote:
As a side note, with the 19 compiler, there is no way to force the compiler to use cbrtf() IF you specify -fp-model source. You asked for it, you got it (power function).

But what if you asked for cbrtf() instead of powf()? Does this still work in 19?

module M
   use ISO_C_BINDING
   implicit none
   interface
      function cbrtf(x) bind(C,name='cbrtf')
         import
         implicit none
         real(C_FLOAT) cbrtf
         real(C_FLOAT), value :: x
      end function cbrtf
   end interface
end module M

program P
   use M
   implicit none
   real x, y
   x = 1500
   y = cbrtf(x)
   write(*,'(f10.7)') y
end program P

View solution in original post

mecej4 · ‎12-20-2018

The two differ only in the least significant (i.e., 23rd) bit. They are Z'4137277F' and Z'41372780' in internal IEEE 32-bit format.

Michael_D_11 · ‎12-20-2018

mecej4 wrote:
The two differ only in the least significant (i.e., 23rd) bit. They are Z'4137277F' and Z'41372780' in internal IEEE 32-bit format.

In my application, consistency of the result is important, so I care when the answer changes, even by one bit, and especially when that change results in a less accurate result.

JAlexiou · ‎12-20-2018

I think a `real` has only precision to (7 significant figures) 5 decimals in this case `11.44714` only. Anything beyond that is rubbish. If print out several more digits you can see that, compared to the true numeric answer of `11.447142425533318678080422119396770...`

Below are the more than enough digits of the above calculation using 32-bit, 63-bit and 128-bit real values

    ! precission    result
    !   32 bits     11.44714260101318359375000
    !   64 bits     11.44714242553331828844421
    !  128 bits     11.447142425533318678080422119396768
    !  ***          11.447142425533318678080422119396770..

Andreas_Z_ · ‎12-20-2018

What is the default 'real' that you are using and the desired precision level? If you use a real(4), the answer to your example is 11.44714 and you should not expect higher precision regardless of the compiler version used. If higher precision is needed, you better adjust the precision level of your floating point variables, for example:

PROGRAM TestCubeRoot

IMPLICIT NONE
REAL(4) :: a, b
REAL(4),PARAMETER :: onethird = 1.0/3.0
REAL(8) :: c, d
REAL(8),PARAMETER :: onethird_d = 1.0D0/3.0D0

b = 1500.0
a = b**onethird

d = 1500.0D0
c = d**onethird_d

WRITE(*,*) "a = ", a
WRITE(*,*) "c = ", c
READ(*,*)
    
END PROGRAM TestCubeRoot

gives the following results (in decimal form) with either compiler version you mention:

a = 11.44714

c = 11.4471424255333

Michael_D_11 · ‎12-20-2018

The responders to this thread are missing the point. Intel changed the function used to calculate cube roots from cbrtf to powf in going from version 18 to 19. This change results in a less precise result, and that loss of precision is carried over in the code, but not seen in default print the results. In the specific case of 1500^(1/3) that difference is in the last bit of the binary representation of the solution. I have generated cases where the difference is more than 1 bit.

Saying that the answer is 11.44714 is incorrect. The answer is 11.4471426010... since the answer can't be exactly represented in binary. Unfortunately, Version 19 thinks the answer is 11.4471435547....which isn't the same number in binary and this difference propagates into future calculations. I don't think it is unreasonable to ask Intel why a repeatable and accurate method of calculating cube roots was replaced in Version 19. I use Fortran, and specifically Intel Fortran, for its efficient and accurate math functions

Steve_Lionel · ‎12-20-2018

I suggest you open an issue with the Intel Online Service Center. That way you'll get an appropriate response that originates with the code generator developers. Otherwise you would be dependent on an Intel support engineer reading this and deciding on their own to open an issue. Admittedly, I used to do this all the time, but that was just me being me.

JAlexiou · ‎12-20-2018

Michael D. wrote:
The answer is 11.4471426010... since the answer can't be exactly represented in binary.

Mathematically the correct answer is `11.447142425533318678..` and not what you expect. The closest 32-bit floating point representation is `11.4471426` or `0x4137277F`.

As for as the switch on the library function for cube root, it might be a product of the compiler options. Are you sure the two compilations have the same floating point compiler settings? Are you using fast-math or strict-math?

Andreas_Z_ · ‎12-20-2018

Michael D., regarding the Version 19 (initial and 19.0.1.144) vs. 18 (version 18.0.1.156) compiler output, on my machine, I get consistently the same value with both versions (x64 as solution platform), namely 11.44714260101318... with 32 bit floating point (beyond intended/correct precision) for the example above.

So, are you sure that your different result is not due to different compilation / run-time settings you use with version 19 vs. 18 or a different CPU? My tests suggest that the 'powf' generated result is not worse than that from older compilers.

Edit: I can reproduce the difference when using the /fp:source switch for the floating point model (but not with 'Fast').

Ron_Green · ‎12-21-2018

Intel has been working on getting more consistent results in our FP model. 18 was not being as strict with fp-model source. 19 is more strict.

And 19 is correct - if you specify SOURCE and your source is

a = b**x

where x is any expression, including (1./3.) you are asking for b to be raised to the x and stored in a. if x is (1./3.) the order of evaluation is (1./3.) done in SINGLE PRECISION, raise b to that power in single precision pow() call, store as single-precision to a. With SOURCE that means 'do it exactly as I have it coded'. This is a power function call.

I argue that what 18 and earlier compilers were doing was wrong. It was looking at the expression of x being 1./3. and saying "oh look, that's really a cubed root. we have a fast cubed root function. I'll do him a favor and simply call this cubed root function.". Same as if x were 1.0/2.0 the compiler can say "oh look, he's doing a square root" and replace the USER INTENDED POWER FUNCTION incorrectly with sqrtf or cbrtf. That is WRONG if you have asked for FP MODEL SOURCE.

As a side note, with the 19 compiler, there is no way to force the compiler to use cbrtf() IF you specify -fp-model source. You asked for it, you got it (power function).

and I'm sure you noticed that if you take off FP MODEL SOURCE you will get the older results. It's not calling cbrtf in the assembly I looked at, it's inlining the calculation w/o a function call of any kind.

The compiler MAY choose not to use the power function in the future, inline, or maybe someday it'll use cbrtf again if that is found to be fast for newer architectures and algorithms for cubed root. Depends on options. The compiler typically, at O2 or O3, will pick a fast method where "fast" is relative the current architectures and their instruction sets.

And, since 1./3. is imprecise, you can get the old results with (1._8/3._8) which will do 1/3 in double precision, raise b to double, call pow() with double arguments, down-convert to single to store to a. And 1/3 could be done with division or a fast lookup table. Again, it can make a difference. On a processor with fast division in hardware (modern) it'll probably do the division. Older days (div instruction slow), a table lookup might've been faster. If you don't control the compiler, it'll pick any fast method where "fast" can change over time.

The Forum users pointed out that IEEE single precision is accurate only to ~7 decimal digits. So my testcase probably looks like your code:

program sqr
implicit none

real :: b=1500.0
real :: ans_b

ans_b= b**(1./3.)

write(*,'(A10,F10.7)') "Floating: ", ans_b
write(*,'(A10,Z8)'   ) "Hex:      ", ans_b

end program sqr

IF you really need 9 digits of precision, the Fortran Standard way of getting this is similar to this form:

program sqr
implicit none
integer, parameter :: d9=selected_real_kind(9)
real(d9) :: b=1500.0_d9
real(d9) :: ans_b

ans_b= b**(1.0_d9/3.0_d9)

write(*,'(A10,F10.7)') "Floating: ", ans_b
write(*,'(A10,Z)'   ) "Hex:      ", ans_b

end program sqr

Which will give you your answer to 9 decimal digits of accuracy. Yes, it selects KIND=8 because that is what you need to guarantee 9 decimal digits of accuracy. This makes it very clear to the compiler and future users your intended accuracy requirements. I would add that using the Fortran Standard kind method above protects your code from compiler version changes, compiler vendor changes, and architecture changes.

JVanB · ‎12-21-2018

Ronald W Green (Intel) wrote:
As a side note, with the 19 compiler, there is no way to force the compiler to use cbrtf() IF you specify -fp-model source. You asked for it, you got it (power function).

But what if you asked for cbrtf() instead of powf()? Does this still work in 19?

module M
   use ISO_C_BINDING
   implicit none
   interface
      function cbrtf(x) bind(C,name='cbrtf')
         import
         implicit none
         real(C_FLOAT) cbrtf
         real(C_FLOAT), value :: x
      end function cbrtf
   end interface
end module M

program P
   use M
   implicit none
   real x, y
   x = 1500
   y = cbrtf(x)
   write(*,'(f10.7)') y
end program P

Michael_D_11 · ‎12-26-2018

Using the /fp: source switch, here is more clear example of the differences between compiler version 18 and 19, where 7 decimal digit repeatability is not maintained:

x = 1.5374178E-14
y = x**(1./3.)

in 18: y = 0.24865507E-04 (HEX: 37D09645)
in 19: y = 0.24865498E-04 (HEX: 37D09640)

Note, it is also a 5 binary bit change in the results.

Ron_Green · ‎12-27-2018

explicitly calling cbrtf from Fortran: yes, this will call will be left as-is and honored by any version of the compiler. The Fortran compiler cannot (should not) replace your call, and it will NOT inline the C function inside the Fortran caller. This is an excellent way to future-proof your application.

Ron_Green · ‎12-27-2018

the literal you have "1.5374178E-14" is using 8 decimals, as you have to count the digits to the left of the decimal point as well. And keep in mind, 32-bit IEEE numbers are accurate to "approximately 7 digits of precision". If you print*, selected_real_kind(7) you will find that compilers will tell you that if you need exactly 7 digits OR MORE of accuracy they will use real kind=8 (or the kind number for double precision which can vary by compiler vendor: 8 for most, but kind=2 for NAG )

Your example, for example, gfortran 7.3 gives the same result as Intel 19, and without any fp-model:

gfortran -O0 -o ex2_g73 ex2.f90
rwgreen@orcsle147:~/quad/04018265$ ./ex2_g73
x 288A7A71
y 37D09640

This should give you concern. If you really need the accuracy of 7 digits, remember my selected real kind example - if you use selected_real_kind(7), this will force everything to double precision so you get at least 7 digits of accuracy ( instead of "approximately 7 digits" in 32-bit FP representation). Is there a reason you're not using double precision? You seem very concerned about getting 7-8 digits of accuracy.

Ron_Green · ‎12-27-2018

In thinking about this some more, your solution to call cbrtf() is best. The problem is 1.0/3.0 is not cannot be represented in a finite number of digits, base 10 or base 2. Thus, raising something to this imprecision value is inherently imprecise.

Your example of using ISO_C_BINDING to cbrtf() is best. You want a cubed root. You should ask for a cubed root, not raising something to the 1/3 power.

AND you example is portable. Here is gfortran 7.3 again with your example

$ cat best.f90
module M
   use ISO_C_BINDING
   implicit none
   interface
      function cbrtf(x) bind(C,name='cbrtf')
         import
         implicit none
         real(C_FLOAT) cbrtf
         real(C_FLOAT), value :: x
      end function cbrtf
   end interface
end module M

program P
   use M
   implicit none
   real x, y
   x = 1500.0
   y = cbrtf(x)
   write(*,'(f10.7)') y
end program P
rwgreen@orcsle147:~/quad/04018265$ !gfortran
gfortran -o best best.f90
rwgreen@orcsle147:~/quad/04018265$ ./best
11.4471426

Right? this is what you want.

Michael_D_11 · ‎12-27-2018

Repeat Offender wrote:

But what if you asked for cbrtf() instead of powf()? Does this still work in 19?

module M
   use ISO_C_BINDING
   implicit none
   interface
      function cbrtf(x) bind(C,name='cbrtf')
         import
         implicit none
         real(C_FLOAT) cbrtf
         real(C_FLOAT), value :: x
      end function cbrtf
   end interface
end module M

program P
   use M
   implicit none
   real x, y
   x = 1500
   y = cbrtf(x)
   write(*,'(f10.7)') y
end program P

Thank you for this solution. While it involves a small amount of recoding, it efficiently gives the correct result and a result that appears to be consistent with that obtained in compiler version 18. Still can't understand how /fp:source concludes that x**(1./.3) means anything but take the cube root, but I guess that is why I am not a compiler developer.

mecej4 · ‎12-27-2018

Michael D. wrote:
Still can't understand how /fp:source concludes that x**(1./.3) means anything but take the cube root, but I guess that is why I am not a compiler developer.

You probably meant to write x**(1./3.), not x**(1./.3).

1) At compilation time, it is not known whether x is positive or negative. The expression (1./3.) is treated just the same as any other real valued expression. No special recognition is given for its being the reciprocal of an odd integer.

2) Therefore, the compiler has to emit code that will work for x >=0 as well as for x < 0.

3) If the exponent b is not the exact reciprocal of an odd integer, the expression x**b is undefined when x is negative, unless you move from the field of real numbers into the complex plane. For example, the expression (-27.0)**b does not have a real value (in the mathematical sense) when b = 0.33333333 or b = 0.33333334. The only value of b for which the expression has a real value, namely, the reciprocal of 3, does not have an exact representation in any 2-base or 10-base floating point representation; as stated in 1), no special case rules are known or applied.

The question has less to do with compiler development than with basic mathematics. Perhaps the following example program can help. Please review the code and form a conclusion as to what it should print. THEN, compile and run the program and compare the output to your expectations.

program xcubrt
   implicit none
   real :: third = 1.0/3.0
   complex :: z
!
   z=(-1.0,0.0)
   z=z**third
   print *,z
end program

mecej4 · ‎12-28-2018

Here is what I hope is a constructive suggestion towards attaining the goals exposed in this thread. Instead of using special functions such as CUBRT(x) or CBRTF(x),

1. Use the '**' operator with the absolute value of the argument x.

2. Affix the proper sign to the result.

3. Polish the result with a single step of Newton's method.

Here is example code to obtain the seventh root of -2187.

program sevrt
implicit none
real :: seventh = 1.0/7.0, y = -2187.0, z, z2, z6, z7
!
z = sign(abs(y)**seventh, y)
z2 = z*z
z6 = z2*z2*z2
z7 = z6*z
print '(ES16.8)',z
z = (6*z7 + y)/(7*z6)   ! single step of Newton's method
print '(ES16.8)',z
end program

The output of this program:

 -3.00000024E+00
 -3.00000000E+00

The last bit of the result (in its IEEE 32-bit representation) from the ** operation is 1. The one-step Newton correction changes that bit to 0.

JohnNichols · ‎12-28-2018

mecej4 wrote:
Here is what I hope is a constructive suggestion towards attaining the goals exposed in this thread. Instead of using special functions such as CUBRT(x) or CBRTF(x),
1. Use the '**' operator with the absolute value of the argument x.
2. Affix the proper sign to the result.
3. Polish the result with a single step of Newton's method.
Here is example code to obtain the seventh root of -2187.
program sevrt
implicit none
real :: seventh = 1.0/7.0, y = -2187.0, z, z2, z6, z7
!
z = sign(abs(y)**seventh, y)
z2 = z*z
z6 = z2*z2*z2
z7 = z6*z
print '(ES16.8)',z
z = (6*z7 + y)/(7*z6)   ! single step of Newton's method
print '(ES16.8)',z
end program
The output of this program:
 -3.00000024E+00
 -3.00000000E+00
The last bit of the result (in its IEEE 32-bit representation) from the ** operation is 1. The one-step Newton correction changes that bit to 0.

Brilliant demonstration

JohnNichols · ‎12-28-2018

program sevrt
     USE IFPORT
    implicit none
    integer i,j
    INTEGER, PARAMETER :: dp = selected_real_kind(15, 307)
    real(KIND=dp) :: seventh = 1.0d0/7.0d0, y = -2187.0, z, z2, z6, z7
    REAL (KIND=dp) :: six = 6.0d0, START_TIME, STOP_TIME
    REAL (KIND=dp) :: seven = 7.0d0, Thousand = 1000.0D0
    START_TIME = DCLOCK()
    !
    
    z = sign(abs(y)**seventh, y)
    z2 = z*z
    z6 = z2*z2*z2
    z7 = z6*z
    print '(ES16.8)',z
    z = (six*z7 + y)/(seven*z6)   ! single step of Newton's method
   print '(ES16.8)',z
    
   
    
    STOP_TIME = DCLOCK()
    write(*,100) (STOP_TIME - START_TIME)
100 Format('Program took:',E14.6, ' seconds.')
    end program

 -3.00000000E+00
 -3.00000000E+00
Program took:  0.000000E+00 seconds.
Press any key to continue . . .

FortranFan · ‎12-28-2018

Nichols, John wrote:
.. Brilliant demonstration

@Nichols, John:

Forgot to include some emoticons to go with your comment?! Because otherwise, it would indicate you really meant it which would be strange because @mecej4 in the suggestion to use Newton correction seems to have intently meant to convey the exact opposite of your comment as (perhaps) "homage" to some of the absurdities in this thread, considering the Newton suggestion.requires 9 additional operations (7 multiplication, 1 division, 1 addition) operations, 3 additional variables, 2 magic numbers, and a sequence of steps, all of which would require encapsulation in a procedure if it is to be reused for the seventh root case but which requires further algebra if it's to be extended generally, and this just so that the coder is advised by @mecej4 not to use an available, robust library function implementation!! and not to overlook the accompanying example was different from the current use case which seems to have settled to the one in Quote #12 (again perhaps as misdirection).

Starting with the original post where no clearly reproducible case was presented (see Quote #9 that was unanswered) and where the numeric analysis offered by Ronald Green at Intel and other effort to describe computations taking into account the accuracy requirements vis-a-vis the floating-point representations and so forth have been ignored, it's a real messy thread.