Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Why does Version 19 change how cube root is calculated?

Michael_D_11
Beginner
3,232 Views

I just upgrade to to compiler version 19.0.0.117 from 18.0.3.210. In reviewing some of my codes I noticed small discrepancies in the results. Tracing these discrepancies led to the evaluation of the cube root function. In version 18, cube root (exponentiation to 1/3) is handled by cbrtf. In version 19 this changes to powf. This appears to be a reversal of what happened between versions 11 and 12 (see: https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/279705). This simple code snippet can be used to observe the change:

real b

real ans_b
b= 1500.

ans_b= b**(1./3.)

In version 18 the result is 11.4471426

In version 19 the result is 11.4471435

Any reason for this change? Especially since the version 18 implementation seems to give the more accurate answer.

0 Kudos
1 Solution
JVanB
Valued Contributor II
3,170 Views

Ronald W Green (Intel) wrote:

As a side note, with the 19 compiler, there is no way to force the compiler to use cbrtf() IF you specify -fp-model source.  You asked for it, you got it (power function).

But what if you asked for cbrtf() instead of powf()? Does this still work in 19?

module M
   use ISO_C_BINDING
   implicit none
   interface
      function cbrtf(x) bind(C,name='cbrtf')
         import
         implicit none
         real(C_FLOAT) cbrtf
         real(C_FLOAT), value :: x
      end function cbrtf
   end interface
end module M

program P
   use M
   implicit none
   real x, y
   x = 1500
   y = cbrtf(x)
   write(*,'(f10.7)') y
end program P

 

View solution in original post

0 Kudos
32 Replies
gib
New Contributor II
764 Views

I'm wondering why the OP is not using 8-byte reals.

0 Kudos
andrew_4619
Honored Contributor II
764 Views

gib wrote:
I'm wondering why the OP is not using 8-byte reals.

You are not alone! Whilst there is some interesting discussion in this thread the fact is the compiler is giving good SP results IMO. If this precision is not enough then use DP, simples. 

0 Kudos
FortranFan
Honored Contributor II
762 Views

Nichols, John wrote:

..
Program took:  0.000000E+00 seconds.
Press any key to continue . . .

 

On modern computers, such single calculation checks can hardly reveal anything useful.  Now if one is looking at different solvers for particular calculations/simulations, a systematic approach toward computational analyses as outlined in almost any good book on numerical methods and/or algorithms will be necessary.  But when one is considering as basic as options for a cube root calculation, there are few performant alternatives to CBRT the C library function, if any at all.  However any modifications/tweaks with the POWF (as with Fortran's '**' operator) are going to result in a few additional floating point operations but they are hardly going to be noticeable unless they are consumed in highly compute intensive circumstances in which case the single calculation check in Quote #20 will not help, one could consider more thorough profiling using tools geared for such analysis or at the very least "brute force" the analysis by "instrumenting" a bunch of code to perform various "runs" of differing problem sizes where each run is combined with a number of trials and some form of a statistical "median" in the measured computational times is captured for a comparative study.  See an example below which shows all the trivial but time-consuming wiring one has to do even for a simple-minded CPU timing study:

program p

   use, intrinsic :: iso_c_binding, only : wp => c_float
   use, intrinsic :: iso_fortran_env, only : dp => real64

   implicit none

   interface
      function cbrtf( x ) result(r) bind(C, name="cbrtf")
         use, intrinsic :: iso_c_binding, only : c_float
         implicit none
         real(c_float), intent(in), value :: x
         ! Function result
         real(c_float) :: r
      end function cbrtf
      function cbrtd( x ) result(r) bind(C, name="cbrt")
         use, intrinsic :: iso_c_binding, only : c_double
         implicit none
         real(c_double), intent(in), value :: x
         ! Function result
         real(c_double) :: r
      end function cbrtd
   end interface
   interface cbrt
      procedure cbrtf
      procedure cbrtd
   end interface

   integer i, j, k
   integer, parameter :: NVALS(*) = [( 100000*i, i=1,20 )]
   integer, parameter :: NTRIALS = 20
   integer :: NumSmallX
   real(wp), allocatable :: x(:), y1(:), y2(:)
   real(dp) :: tim1(NTRIALS), tim2(NTRIALS)
   logical :: ltim(NTRIALS)
   real(dp) :: AveTim1(size(NVALS)), AveTim2(size(NVALS))
   real(dp) :: START_TIME, STOP_TIME

   print *, "Cube Root Calculations: Computational Time"
   print "(g0,t10,g0,t26,g0,t51,g0)", "Run", "Num Calcs", "Exp with Newton Corr.", "CBRT"
   print "(t26,g0,t51,g0)", "CPU Time (seconds)", "CPU Time (seconds)"
   print "(t26,g0,t51,g0)", "per 1E6 Calcs", "per 1E6 Calcs"

   Loop_Nvals: do k = 1, size(NVALS)

      allocate( x(NVALS(k)), y1(NVALS(k)), y2(NVALS(k)) )

      Loop_Ntrials: do j = 1, NTRIALS

         call loadvals( x, expnt=3 )

         ! Cube root with '**' operator plus Newton correction
         blk_root3: block
            real(wp), parameter :: ONE = 1.0_wp
            real(wp), parameter :: TWO = 2.0_wp
            real(wp), parameter :: THREE = 3.0_wp
            real(wp), parameter :: THIRD = ONE/THREE
            real(wp), parameter :: TWOTHIRD = TWO/THREE
            real(wp), parameter :: EPS_X = epsilon(ONE)**THIRD
            call cpu_time(START_TIME)
            do i = 1, size(x)
               y1(i) = sign(abs(x(i))**THIRD, x(i)) !<-- Allow cube root of negative values
               ! Apply correction using a single step of Newton's method
               if ( abs(x(i)) > EPS_X ) then                  !
                  y1(i) = TWOTHIRD*y1(i) + x(i)/(THREE*y1(i)*y1(i))
               end if
            end do
            call cpu_time(STOP_TIME)
            tim1(j) = STOP_TIME - START_TIME
         end block blk_root3

         ! Cube root using C math library function CBRT
         call cpu_time(START_TIME)
         do i = 1, size(x)
            y2(i) = cbrt(x(i))
         end do
         call cpu_time(STOP_TIME)
         tim2(j) = STOP_TIME - START_TIME

      end do Loop_Ntrials

      ltim = .true.
      ltim(minloc(tim1)) = .false. ; ltim(maxloc(tim1)) = .false. !<-- Throw out best and worst times
      AveTim1(k) = sum(tim1, mask=ltim)/real(NTRIALS-2, kind=kind(AveTim1))*1E6_dp/ &
         real(NVALS(k), kind=kind(AveTim1))
      ltim = .true.
      ltim(minloc(tim2)) = .false. ; ltim(maxloc(tim2)) = .false. !<-- Throw out best and worst times
      AveTim2(k) = sum(tim2, mask=ltim)/real(NTRIALS-2, kind=kind(AveTim2))*1E6_dp/ &
         real(NVALS(k), kind=kind(AveTim2))
      print "(g0,t10,g0,t25,1pf0.3,t50,1pf0.3)", k, NVALS(k), AveTim1(k), AveTim2(k)

      ! Print some results
      if ( k == size(NVALS) ) then
         print *, "Some Results from last run"
         print "(g0,t22,g0,t37,g0,t57,g0)", "i", "x", "y (Newton Corr.)", "y (CBRT)"
         NumSmallX = 0
         do i = 1, size(x)
            if ( mod(i,size(x)/10) == 0 ) then
               print "(i0,t15,f15.7,t30,f15.7,t50,f15.7)", i, x(i), y1(i), y2(i)
            else if ( x(i) < 0.1_wp ) then
               if ( NumSmallX < 10 ) then
                  print "(i0,t15,f15.7,t30,f15.7,t50,f15.7)", i, x(i), y1(i), y2(i)
                  NumSmallX = NumSmallX + 1
               end if
            end if
         end do
      end if

      deallocate( x, y1, y2 )

   end do Loop_Nvals

   stop

contains

   subroutine loadvals( x, expnt )
   ! Load values into data array with an arbitrary approach
   ! Pseudo random number generation plus arbitrary scaling

      real(wp), intent(inout) :: x(:)
      integer, intent(in)     :: expnt

      call random_number( x )
      where ( x > 0.5_wp )
         x = real( int(x*100.0_wp)**expnt, kind=kind(x))
      else where ( x > 0.1_wp )
         x = -real( int(x*100.0_wp)**expnt, kind=kind(x))
      end where

      return

   end subroutine

end program

Now when this code is run with the 2 versions of Intel Fortran compilers mentioned by OP, the output is:

C:\Temp>ifort /fp:source /O2 /heap-arrays0 p.f90
Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R
) 64, Version 18.0.3.210 Build 20180410
Copyright (C) 1985-2018 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.14.26433.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:p.exe
-subsystem:console
p.obj

C:\Temp>p.exe
 Cube Root Calculations: Computational Time
Run      Num Calcs       Exp with Newton Corr.    CBRT
                         CPU Time (seconds)       CPU Time (seconds)
                         per 1E6 Calcs            per 1E6 Calcs
1        100000         .087                     .000
2        200000         .087                     .000
3        300000         .058                     .029
4        400000         .065                     .022
5        500000         .087                     .035
6        600000         .043                     .058
7        700000         .111                     .000
8        800000         .076                     .054
9        900000         .096                     .000
10       1000000        .061                     .061
11       1100000        .079                     .063
12       1200000        .079                     .058
13       1300000        .060                     .080
14       1400000        .068                     .062
15       1500000        .075                     .064
16       1600000        .087                     .060
17       1700000        .087                     .061
18       1800000        .072                     .043
19       1900000        .078                     .068
20       2000000        .078                     .065
 Some Results from last run
i                    x              y (Newton Corr.)    y (CBRT)
1               -1000.0000000    -10.0000000         -10.0000000
2               -8000.0000000    -20.0000000         -20.0000000
6                   0.0280244      0.3037472           0.3037472
10             -59319.0000000    -39.0000000         -39.0000000
11             -79507.0000000    -43.0000000         -43.0000000
15                  0.0987109      0.4621557           0.4621557
18                  0.0132428      0.2365883           0.2365883
21             -39304.0000000    -34.0000000         -34.0000000
23              -9261.0000000    -21.0000000         -21.0000000
24              -4913.0000000    -17.0000000         -17.0000000
200000         571787.0000000     83.0000000          83.0000000
400000          -1000.0000000    -10.0000000         -10.0000000
600000         216000.0000000     60.0000000          60.0000000
800000         -42875.0000000    -35.0000000         -35.0000000
1000000        -17576.0000000    -26.0000000         -26.0000000
1200000        804357.0000000     93.0000000          93.0000000
1400000        -10648.0000000    -22.0000000         -22.0000000
1600000        571787.0000000     83.0000000          83.0000000
1800000        -54872.0000000    -38.0000000         -38.0000000
2000000        -85184.0000000    -44.0000000         -44.0000000

C:\Temp>
C:\Temp>ifort /fp:source /O2 /heap-arrays0 p.f90
Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R
) 64, Version 19.0.1.144 Build 20181018
Copyright (C) 1985-2018 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.14.26433.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:p.exe
-subsystem:console
p.obj

C:\Temp>p.exe
 Cube Root Calculations: Computational Time
Run      Num Calcs       Exp with Newton Corr.    CBRT
                         CPU Time (seconds)       CPU Time (seconds)
                         per 1E6 Calcs            per 1E6 Calcs
1        100000         .347                     .000
2        200000         .217                     .043
3        300000         .173                     .058
4        400000         .152                     .065
5        500000         .173                     .087
6        600000         .173                     .000
7        700000         .136                     .099
8        800000         .173                     .065
9        900000         .173                     .077
10       1000000        .156                     .061
11       1100000        .150                     .055
12       1200000        .173                     .079
13       1300000        .153                     .093
14       1400000        .173                     .093
15       1500000        .156                     .087
16       1600000        .184                     .098
17       1700000        .189                     .092
18       1800000        .193                     .072
19       1900000        .178                     .073
20       2000000        .173                     .078
 Some Results from last run
i                    x              y (Newton Corr.)    y (CBRT)
1               -1000.0000000    -10.0000000         -10.0000000
2               -8000.0000000    -20.0000000         -20.0000000
6                   0.0280244      0.3037472           0.3037472
10             -59319.0000000    -39.0000000         -39.0000000
11             -79507.0000000    -43.0000000         -43.0000000
15                  0.0987109      0.4621557           0.4621557
18                  0.0132428      0.2365883           0.2365883
21             -39304.0000000    -34.0000000         -34.0000000
23              -9261.0000000    -21.0000000         -21.0000000
24              -4913.0000000    -17.0000000         -17.0000000
200000         571787.0000000     83.0000000          83.0000000
400000          -1000.0000000    -10.0000000         -10.0000000
600000         216000.0000000     60.0000000          60.0000000
800000         -42875.0000000    -35.0000000         -35.0000000
1000000        -17576.0000000    -26.0000000         -26.0000000
1200000        804357.0000000     93.0000000          93.0000000
1400000        -10648.0000000    -22.0000000         -22.0000000
1600000        571787.0000000     83.0000000          83.0000000
1800000        -54872.0000000    -38.0000000         -38.0000000
2000000        -85184.0000000    -44.0000000         -44.0000000

C:\Temp>

In addition to the obvious point as stated upthread a library function for cube root should be a no-brainer for such a specific computational need compared to the more generic exponentiation operation and so any additional "correction" operations will incur further cost, however its impact is relative to one's needs.

But now if Ronald Green or other Intel staff are looking at this, they can try to explain the considerable differences in CPU times with "**" at larger problem sizes, as shown above, with Intel Fortran compiler 19.0 update 1 relative to 18.0 update 3 when /fp:source compiler option is in effect.  Both the cases should be using the same power function call with single precision (C_FLOAT), so why is there such a difference?

0 Kudos
gib
New Contributor II
762 Views

Hi FortranFan,

Am I wrong to be surprised by the occurrence of CBRT times of .000?  What do you see when you write more decimal places?

Cheers, Gib

0 Kudos
FortranFan
Honored Contributor II
762 Views

gib wrote:

Hi FortranFan,

Am I wrong to be surprised by the occurrence of CBRT times of .000?  What do you see when you write more decimal places?

Cheers, Gib

Hi Gib,

The zero values are due to limitations with CPU_TIME intrinsic, a better way always is to work with tick rates for CPU timing studies using the SYSTEM_CLOCK intrinsic.  See a modified example below, there is also one other change - I'd meant to print more values for when abs(x) is small, say <0.1:

program p

   use, intrinsic :: iso_c_binding, only : wp => c_float
   use, intrinsic :: iso_fortran_env, only : dp => real64, I8 => int64

   implicit none

   interface
      function cbrtf( x ) result(r) bind(C, name="cbrtf")
         use, intrinsic :: iso_c_binding, only : c_float
         implicit none
         real(c_float), intent(in), value :: x
         ! Function result
         real(c_float) :: r
      end function cbrtf
      function cbrtd( x ) result(r) bind(C, name="cbrt")
         use, intrinsic :: iso_c_binding, only : c_double
         implicit none
         real(c_double), intent(in), value :: x
         ! Function result
         real(c_double) :: r
      end function cbrtd
   end interface
   interface cbrt
      procedure cbrtf
      procedure cbrtd
   end interface

   integer i, j, k
   integer, parameter :: NVALS(*) = [( 100000*i, i=1,20 )]
   integer, parameter :: NTRIALS = 20
   integer :: NumSmallX
   real(wp), allocatable :: x(:), y1(:), y2(:)
   real(dp) :: tim1(NTRIALS), tim2(NTRIALS)
   logical :: ltim(NTRIALS)
   real(dp) :: AveTim1(size(NVALS)), AveTim2(size(NVALS))
   real(dp) :: START_TIME, STOP_TIME

   print *, "Cube Root Calculations: Computational Time"
   print "(g0,t10,g0,t26,g0,t51,g0)", "Run", "Num Calcs", "Exp with Newton Corr.", "CBRT"
   print "(t26,g0,t51,g0)", "CPU Time (seconds)", "CPU Time (seconds)"
   print "(t26,g0,t51,g0)", "per 1E6 Calcs", "per 1E6 Calcs"

   Loop_Nvals: do k = 1, size(NVALS)

      allocate( x(NVALS(k)), y1(NVALS(k)), y2(NVALS(k)) )

      Loop_Ntrials: do j = 1, NTRIALS

         call loadvals( x, expnt=3 )

         ! Cube root with '**' operator plus Newton correction
         blk_root3: block
            real(wp), parameter :: ONE = 1.0_wp
            real(wp), parameter :: TWO = 2.0_wp
            real(wp), parameter :: THREE = 3.0_wp
            real(wp), parameter :: THIRD = ONE/THREE
            real(wp), parameter :: TWOTHIRD = TWO/THREE
            real(wp), parameter :: EPS_X = epsilon(ONE)**THIRD
            call cpu_t(START_TIME)
            do i = 1, size(x)
               y1(i) = sign(abs(x(i))**THIRD, x(i)) !<-- Allow cube root of negative values
               ! Apply correction using a single step of Newton's method
               if ( abs(x(i)) > EPS_X ) then                  !
                  y1(i) = TWOTHIRD*y1(i) + x(i)/(THREE*y1(i)*y1(i))
               end if
            end do
            call cpu_t(STOP_TIME)
            tim1(j) = STOP_TIME - START_TIME
         end block blk_root3

         ! Cube root using C math library function CBRT
         call cpu_t(START_TIME)
         do i = 1, size(x)
            y2(i) = cbrt(x(i))
         end do
         call cpu_t(STOP_TIME)
         tim2(j) = STOP_TIME - START_TIME

      end do Loop_Ntrials

      ltim = .true.
      ltim(minloc(tim1)) = .false. ; ltim(maxloc(tim1)) = .false. !<-- Throw out best and worst times
      AveTim1(k) = sum(tim1, mask=ltim)/real(NTRIALS-2, kind=kind(AveTim1))*1E6_dp/ &
         real(NVALS(k), kind=kind(AveTim1))
      ltim = .true.
      ltim(minloc(tim2)) = .false. ; ltim(maxloc(tim2)) = .false. !<-- Throw out best and worst times
      AveTim2(k) = sum(tim2, mask=ltim)/real(NTRIALS-2, kind=kind(AveTim2))*1E6_dp/ &
         real(NVALS(k), kind=kind(AveTim2))
      print "(g0,t10,g0,t25,1pf0.3,t50,1pf0.3)", k, NVALS(k), AveTim1(k), AveTim2(k)

      ! Print some results
      if ( k == size(NVALS) ) then
         print *, "Some Results from last run"
         print "(g0,t22,g0,t37,g0,t57,g0)", "i", "x", "y (Newton Corr.)", "y (CBRT)"
         NumSmallX = 0
         do i = 1, size(x)
            if ( mod(i,size(x)/10) == 0 ) then
               print "(i0,t15,f15.7,t30,f15.7,t50,f15.7)", i, x(i), y1(i), y2(i)
            else if ( abs(x(i)) < 0.1_wp ) then
               if ( NumSmallX < 10 ) then
                  print "(i0,t15,f15.7,t30,f15.7,t50,f15.7)", i, x(i), y1(i), y2(i)
                  NumSmallX = NumSmallX + 1
               end if
            end if
         end do
      end if

      deallocate( x, y1, y2 )

   end do Loop_Nvals

   stop

contains

   subroutine loadvals( x, expnt )
   ! Load values into data array with an arbitrary approach
   ! Pseudo random number generation plus arbitrary scaling

      real(wp), intent(inout) :: x(:)
      integer, intent(in)     :: expnt

      call random_number( x )
      where ( x > 0.5_wp )
         x = real( int(x*100.0_wp)**expnt, kind=kind(x))
      else where ( x > 0.1_wp )
         x = -real( int(x*100.0_wp)**expnt, kind=kind(x))
      end where

      return

   end subroutine

   subroutine cpu_t( time )
      !.. Argument list
      real(dp), intent(inout) :: time
      !.. Local variables
      integer(I8) :: tick
      integer(I8) :: rate
      call system_clock (tick, rate)
      time = real(tick, kind=kind(time) ) / real(rate, kind=kind(time) )
      return
   end subroutine cpu_t

end program

Here's an output of an execution with 18.0.3:

C:\Temp>p.exe
 Cube Root Calculations: Computational Time
Run      Num Calcs       Exp with Newton Corr.    CBRT
                         CPU Time (seconds)       CPU Time (seconds)
                         per 1E6 Calcs            per 1E6 Calcs
1        100000         .061                     .072
2        200000         .081                     .072
3        300000         .083                     .061
4        400000         .081                     .061
5        500000         .078                     .064
6        600000         .077                     .065
7        700000         .081                     .065
8        800000         .079                     .064
9        900000         .078                     .065
10       1000000        .079                     .062
11       1100000        .079                     .063
12       1200000        .078                     .064
13       1300000        .079                     .065
14       1400000        .080                     .063
15       1500000        .077                     .066
16       1600000        .079                     .064
17       1700000        .076                     .064
18       1800000        .078                     .065
19       1900000        .078                     .064
20       2000000        .077                     .064
 Some Results from last run
i                    x              y (Newton Corr.)    y (CBRT)
6                   0.0280244      0.3037472           0.3037472
15                  0.0987109      0.4621557           0.4621557
18                  0.0132428      0.2365883           0.2365883
26                  0.0402600      0.3427345           0.3427345
40                  0.0630255      0.3979595           0.3979595
64                  0.0813574      0.4333103           0.4333102
68                  0.0097841      0.2138815           0.2138815
69                  0.0658390      0.4037952           0.4037952
72                  0.0442022      0.3535748           0.3535747
75                  0.0902116      0.4484914           0.4484914
200000         571787.0000000     83.0000000          83.0000000
400000          -1000.0000000    -10.0000000         -10.0000000
600000         216000.0000000     60.0000000          60.0000000
800000         -42875.0000000    -35.0000000         -35.0000000
1000000        -17576.0000000    -26.0000000         -26.0000000
1200000        804357.0000000     93.0000000          93.0000000
1400000        -10648.0000000    -22.0000000         -22.0000000
1600000        571787.0000000     83.0000000          83.0000000
1800000        -54872.0000000    -38.0000000         -38.0000000
2000000        -85184.0000000    -44.0000000         -44.0000000

C:\Temp>

 

0 Kudos
Michael_D_11
Beginner
762 Views

andrew_4619 wrote:

Quote:

gib wrote:

I'm wondering why the OP is not using 8-byte reals.

 

You are not alone! Whilst there is some interesting discussion in this thread the fact is the compiler is giving good SP results IMO. If this precision is not enough then use DP, simples. 

Using DP doesn't resolve the problem, it just pushes to less significant digits. I am not expecting results to 7 or even 6 significant digits. What I would like is that the internal values (which is why I reported hex values for numbers) don't change between compilers and are accurate. These values are propagated within the code at maximum precision and a 1 to 5 bit change can become noticeable after many many sequential calculations. I shouldn't have to use DP to mitigate this type of behavior, it is an inelegant coding technique reminiscent of two to three decades ago and increases the memory footprint and processor time significantly when used universally.

 

summary - compiler changes implemented less accurate resolution of floating point math which can propagate through long sequential calculations that can be observed in changes in the final resulted presented in decimal form.

0 Kudos
Steve_Lionel
Honored Contributor III
762 Views

Michael D. wrote:

What I would like is that the internal values (which is why I reported hex values for numbers) don't change between compilers and are accurate.

In the famous words of Dread Pirate Roberts, "Learn to live with disappointment.". There are many things that can change computed values, including vectorization and even stack alignment. Also order of operations, which is not completely in your control, can yield small differences. See sc13.supercomputing.org/sites/default/files/WorkshopsArchive/pdfs/wp129s1.pdf for some exploration of this.

0 Kudos
andrew_4619
Honored Contributor II
762 Views

Michael D. wrote:
Using DP doesn't resolve the problem, it just pushes to less significant digits. I am not expecting results to 7 or even 6 significant digits..........

Using DP clearly DOES resolve the problem because it would appear to give results consistent to the level of accuracy you state you require.

0 Kudos
FortranFan
Honored Contributor II
762 Views

Steve Lionel (Ret.) wrote:

.. In the famous words of Dread Pirate Roberts, "Learn to live with disappointment.". There are many things that can change computed values, including vectorization and even stack alignment. Also order of operations, which is not completely in your control, can yield small differences. See sc13.supercomputing.org/sites/default/files/WorkshopsArchive/pdfs/wp129s1.pdf for some exploration of this.

@Michael D.,

You may be familiar with this article from Intel staff: https://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/

You are making a case for yourself as stated in the article, "it is .. useful to have a degree of reproducibility that goes beyond the inherent accuracy of a computation. Some software .. require .. even bit-for-bit, agreement between results before and after software changes, even though the mathematical uncertainty in the result of the computation may be considerably larger. The right compiler options can deliver consistent, closely reproducible results ..."

Are you following all the recommendations in this article?  If yes, are you still noticing differences?  If so, see Quote #7 where Steve Lionel advises you to submit a support request with Intel (https://supporttickets.intel.com/?lang=en-US).  Have you submitted such a request?

 

0 Kudos
John_Campbell
New Contributor II
762 Views

Just out of interest, I tried two other compilers; Silverfrost FTN95 (Win32 and x64) and gFortran x64. To report the value, I used:

write (*,fmt='(1x,a,f0.7)') 'the result is ',ans_b

This was because "write (*,*) ans_b" only reported 4 sig figures (11.4471) for FTN95, but 7 sig figures for gFortran (11.4471436)

In all 3 cases with f0.7, I got " the result is 11.4471436"

This is a different approach to rounding up for 32-bit real !

I suspect the only difference between Ver 18 and Ver 19 is in rounding in "write (*,*)"

Given the difference reported in the OP post is due only to different rounding in write (*,*), how does this affect how the program runs ?

To attribute a change only to the 23rd bit, as causing significant problems for the calculation would imply that 32-bit reals are a poor choice.

0 Kudos
FortranFan
Honored Contributor II
762 Views

John Campbell wrote:

.. how does this affect how the program runs ?

To attribute a change only to the 23rd bit, as causing significant problems for the calculation would imply that 32-bit reals are a poor choice.

That's not the issue, the problem is one of consistency and reproducibility and what Intel Fortran compiler team intends to offer its users vis-a-vis the programmer expectations and needs.

OP has stated certain needs in this thread that some Intel staff has separately acknowledged, as in:

Programmers of floating-point applications typically have the following objectives:
..
• Reproducibility
o Produce consistent results:
..
- From one compiler to another
..
Some software quality assurance tests may require close, or even bit-for-bit, agreement between
results before and after software changes, even though the mathematical uncertainty in the result
of the computation may be considerably larger. The right compiler options can deliver consistent,
closely reproducible results while preserving good (though not optimal) performance.
..
Bottom Line

..
In the version 17 compiler or later, best reproducibility may be obtained with the single switch
 /fp:consistent (Windows) .. which sets all of the above options.

Readers will see this with the instruction mentioned in the original post:

C:\Temp>type p.f90
   character(len=*), parameter :: fmtg = "(*(g0))"
   character(len=*), parameter :: fmth = "(g0,z0)"
   character(len=*), parameter :: fmtb = "(g0,b0)"
   blk1: block
      integer, parameter :: WP = selected_real_kind( p=6 )
      real(WP), parameter :: THIRD = 1.0_wp/3.0_wp
      real(WP) :: x, y
      print fmtg, "blk1: x, y with REAL kind=", kind(x), " and precision=", prec
ision(x)
      x = 1500.0_wp
      y = x**THIRD
      print fmtg, "x = ", x
      print fmtg, "y = x**THIRD = ", y
      print fmth, "y (HEX) = ", y
      print fmtb, "y (BINARY) = ", y
   end block blk1
   print *
   blk2: block
      integer, parameter :: WP = selected_real_kind( p=12 )
      real(WP), parameter :: THIRD = 1.0_wp/3.0_wp
      real(WP) :: x, y
      print fmtg, "blk2: x, y with REAL kind=", kind(x), " and precision=", prec
ision(x)
      x = 1500.0_wp
      y = x**THIRD
      print fmtg, "x = ", x
      print fmtg, "y = x**THIRD = ", y
      print fmth, "y (HEX) = ", y
      print fmtb, "y (BINARY) = ", y
   end block blk2
end

C:\Temp>ifort /fp:consistent p.f90
Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R
) 64, Version 18.0.3.210 Build 20180410
Copyright (C) 1985-2018 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.14.26433.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:p.exe
-subsystem:console
p.obj

C:\Temp>p.exe
blk1: x, y with REAL kind=4 and precision=6
x = 1500.000
y = x**THIRD = 11.44714
y (HEX) = 4137277F
y (BINARY) = 1000001001101110010011101111111

blk2: x, y with REAL kind=8 and precision=15
x = 1500.000000000000
y = x**THIRD = 11.44714242553332
y (HEX) = 4026E4EFDA1CA3B2
y (BINARY) = 100000000100110111001001110111111011010000111001010001110110010

C:\Temp>

whereas with Intel Fortran 19.0.1:

C:\Temp>ifort /fp:consistent p.f90
Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R
) 64, Version 19.0.0.117 Build 20180804
Copyright (C) 1985-2018 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.14.26433.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:p.exe
-subsystem:console
p.obj

C:\Temp>p.exe
blk1: x, y with REAL kind=4 and precision=6
x = 1500.000
y = x**THIRD = 11.44714
y (HEX) = 41372780
y (BINARY) = 1000001001101110010011110000000

blk2: x, y with REAL kind=8 and precision=15
x = 1500.000000000000
y = x**THIRD = 11.44714242553332
y (HEX) = 4026E4EFDA1CA3B1
y (BINARY) = 100000000100110111001001110111111011010000111001010001110110001

C:\Temp>

OP finds the above shown inconsistency in block blk1 (y=z'41372780' vs y=z'4137277F') between the two Intel Fortran versions unacceptable: the matter of interest to me is what support Intel Fortran team will provide to customers in such a situation!

0 Kudos
jimdempseyatthecove
Honored Contributor III
762 Views

FF, Great presentation. The only thing I can add is:

The one bit difference, accumulated in same magnitude numbers (one example is to add to self), after 1 million iterations will differ by 20 bits from the lsb. IOW the results of a simulation could dieviate significantly. This in no way indicates as which is more correct, rather the results differ. Different results can mean either minor change (e.g. single bit difference) in operation .OR. something wrong with compilation (bug). When results of an operation differ, then it becomes problematic to determine if the difference is due to a bug or not. When the application consists of 100's or 1000's of source files, this places a tremendous burden on the certification of the new compiler.

Jim Dempsey

0 Kudos
Reply