I just upgrade to to compiler version 19.0.0.117 from 18.0.3.210. In reviewing some of my codes I noticed small discrepancies in the results. Tracing these discrepancies led to the evaluation of the cube root function. In version 18, cube root (exponentiation to 1/3) is handled by cbrtf. In version 19 this changes to powf. This appears to be a reversal of what happened between versions 11 and 12 (see: https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/279705). This simple code snippet can be used to observe the change:
real b real ans_b b= 1500. ans_b= b**(1./3.)
In version 18 the result is 11.4471426
In version 19 the result is 11.4471435
Any reason for this change? Especially since the version 18 implementation seems to give the more accurate answer.
Nichols, John wrote:.. Brilliant demonstration
@Nichols, John:
Forgot to include some emoticons to go with your comment?! Because otherwise, it would indicate you really meant it which would be strange because @mecej4 in the suggestion to use Newton correction seems to have intently meant to convey the exact opposite of your comment as (perhaps) "homage" to some of the absurdities in this thread, considering the Newton suggestion.requires 9 additional operations (7 multiplication, 1 division, 1 addition) operations, 3 additional variables, 2 magic numbers, and a sequence of steps, all of which would require encapsulation in a procedure if it is to be reused for the seventh root case but which requires further algebra if it's to be extended generally, and this just so that the coder is advised by @mecej4 not to use an available, robust library function implementation!! and not to overlook the accompanying example was different from the current use case which seems to have settled to the one in Quote #12 (again perhaps as misdirection).
Starting with the original post where no clearly reproducible case was presented (see Quote #9 that was unanswered) and where the numeric analysis offered by Ronald Green at Intel and other effort to describe computations taking into account the accuracy requirements vis-a-vis the floating-point representations and so forth have been ignored, it's a real messy thread.
I'm wondering why the OP is not using 8-byte reals.
gib wrote:
I'm wondering why the OP is not using 8-byte reals.
You are not alone! Whilst there is some interesting discussion in this thread the fact is the compiler is giving good SP results IMO. If this precision is not enough then use DP, simples.
Nichols, John wrote:
..Program took: 0.000000E+00 seconds. Press any key to continue . . .
On modern computers, such single calculation checks can hardly reveal anything useful. Now if one is looking at different solvers for particular calculations/simulations, a systematic approach toward computational analyses as outlined in almost any good book on numerical methods and/or algorithms will be necessary. But when one is considering as basic as options for a cube root calculation, there are few performant alternatives to CBRT the C library function, if any at all. However any modifications/tweaks with the POWF (as with Fortran's '**' operator) are going to result in a few additional floating point operations but they are hardly going to be noticeable unless they are consumed in highly compute intensive circumstances in which case the single calculation check in Quote #20 will not help, one could consider more thorough profiling using tools geared for such analysis or at the very least "brute force" the analysis by "instrumenting" a bunch of code to perform various "runs" of differing problem sizes where each run is combined with a number of trials and some form of a statistical "median" in the measured computational times is captured for a comparative study. See an example below which shows all the trivial but time-consuming wiring one has to do even for a simple-minded CPU timing study:
program p use, intrinsic :: iso_c_binding, only : wp => c_float use, intrinsic :: iso_fortran_env, only : dp => real64 implicit none interface function cbrtf( x ) result(r) bind(C, name="cbrtf") use, intrinsic :: iso_c_binding, only : c_float implicit none real(c_float), intent(in), value :: x ! Function result real(c_float) :: r end function cbrtf function cbrtd( x ) result(r) bind(C, name="cbrt") use, intrinsic :: iso_c_binding, only : c_double implicit none real(c_double), intent(in), value :: x ! Function result real(c_double) :: r end function cbrtd end interface interface cbrt procedure cbrtf procedure cbrtd end interface integer i, j, k integer, parameter :: NVALS(*) = [( 100000*i, i=1,20 )] integer, parameter :: NTRIALS = 20 integer :: NumSmallX real(wp), allocatable :: x(:), y1(:), y2(:) real(dp) :: tim1(NTRIALS), tim2(NTRIALS) logical :: ltim(NTRIALS) real(dp) :: AveTim1(size(NVALS)), AveTim2(size(NVALS)) real(dp) :: START_TIME, STOP_TIME print *, "Cube Root Calculations: Computational Time" print "(g0,t10,g0,t26,g0,t51,g0)", "Run", "Num Calcs", "Exp with Newton Corr.", "CBRT" print "(t26,g0,t51,g0)", "CPU Time (seconds)", "CPU Time (seconds)" print "(t26,g0,t51,g0)", "per 1E6 Calcs", "per 1E6 Calcs" Loop_Nvals: do k = 1, size(NVALS) allocate( x(NVALS(k)), y1(NVALS(k)), y2(NVALS(k)) ) Loop_Ntrials: do j = 1, NTRIALS call loadvals( x, expnt=3 ) ! Cube root with '**' operator plus Newton correction blk_root3: block real(wp), parameter :: ONE = 1.0_wp real(wp), parameter :: TWO = 2.0_wp real(wp), parameter :: THREE = 3.0_wp real(wp), parameter :: THIRD = ONE/THREE real(wp), parameter :: TWOTHIRD = TWO/THREE real(wp), parameter :: EPS_X = epsilon(ONE)**THIRD call cpu_time(START_TIME) do i = 1, size(x) y1(i) = sign(abs(x(i))**THIRD, x(i)) !<-- Allow cube root of negative values ! Apply correction using a single step of Newton's method if ( abs(x(i)) > EPS_X ) then ! y1(i) = TWOTHIRD*y1(i) + x(i)/(THREE*y1(i)*y1(i)) end if end do call cpu_time(STOP_TIME) tim1(j) = STOP_TIME - START_TIME end block blk_root3 ! Cube root using C math library function CBRT call cpu_time(START_TIME) do i = 1, size(x) y2(i) = cbrt(x(i)) end do call cpu_time(STOP_TIME) tim2(j) = STOP_TIME - START_TIME end do Loop_Ntrials ltim = .true. ltim(minloc(tim1)) = .false. ; ltim(maxloc(tim1)) = .false. !<-- Throw out best and worst times AveTim1(k) = sum(tim1, mask=ltim)/real(NTRIALS-2, kind=kind(AveTim1))*1E6_dp/ & real(NVALS(k), kind=kind(AveTim1)) ltim = .true. ltim(minloc(tim2)) = .false. ; ltim(maxloc(tim2)) = .false. !<-- Throw out best and worst times AveTim2(k) = sum(tim2, mask=ltim)/real(NTRIALS-2, kind=kind(AveTim2))*1E6_dp/ & real(NVALS(k), kind=kind(AveTim2)) print "(g0,t10,g0,t25,1pf0.3,t50,1pf0.3)", k, NVALS(k), AveTim1(k), AveTim2(k) ! Print some results if ( k == size(NVALS) ) then print *, "Some Results from last run" print "(g0,t22,g0,t37,g0,t57,g0)", "i", "x", "y (Newton Corr.)", "y (CBRT)" NumSmallX = 0 do i = 1, size(x) if ( mod(i,size(x)/10) == 0 ) then print "(i0,t15,f15.7,t30,f15.7,t50,f15.7)", i, x(i), y1(i), y2(i) else if ( x(i) < 0.1_wp ) then if ( NumSmallX < 10 ) then print "(i0,t15,f15.7,t30,f15.7,t50,f15.7)", i, x(i), y1(i), y2(i) NumSmallX = NumSmallX + 1 end if end if end do end if deallocate( x, y1, y2 ) end do Loop_Nvals stop contains subroutine loadvals( x, expnt ) ! Load values into data array with an arbitrary approach ! Pseudo random number generation plus arbitrary scaling real(wp), intent(inout) :: x(:) integer, intent(in) :: expnt call random_number( x ) where ( x > 0.5_wp ) x = real( int(x*100.0_wp)**expnt, kind=kind(x)) else where ( x > 0.1_wp ) x = -real( int(x*100.0_wp)**expnt, kind=kind(x)) end where return end subroutine end program
Now when this code is run with the 2 versions of Intel Fortran compilers mentioned by OP, the output is:
C:\Temp>ifort /fp:source /O2 /heap-arrays0 p.f90 Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R ) 64, Version 18.0.3.210 Build 20180410 Copyright (C) 1985-2018 Intel Corporation. All rights reserved. Microsoft (R) Incremental Linker Version 14.14.26433.0 Copyright (C) Microsoft Corporation. All rights reserved. -out:p.exe -subsystem:console p.obj C:\Temp>p.exe Cube Root Calculations: Computational Time Run Num Calcs Exp with Newton Corr. CBRT CPU Time (seconds) CPU Time (seconds) per 1E6 Calcs per 1E6 Calcs 1 100000 .087 .000 2 200000 .087 .000 3 300000 .058 .029 4 400000 .065 .022 5 500000 .087 .035 6 600000 .043 .058 7 700000 .111 .000 8 800000 .076 .054 9 900000 .096 .000 10 1000000 .061 .061 11 1100000 .079 .063 12 1200000 .079 .058 13 1300000 .060 .080 14 1400000 .068 .062 15 1500000 .075 .064 16 1600000 .087 .060 17 1700000 .087 .061 18 1800000 .072 .043 19 1900000 .078 .068 20 2000000 .078 .065 Some Results from last run i x y (Newton Corr.) y (CBRT) 1 -1000.0000000 -10.0000000 -10.0000000 2 -8000.0000000 -20.0000000 -20.0000000 6 0.0280244 0.3037472 0.3037472 10 -59319.0000000 -39.0000000 -39.0000000 11 -79507.0000000 -43.0000000 -43.0000000 15 0.0987109 0.4621557 0.4621557 18 0.0132428 0.2365883 0.2365883 21 -39304.0000000 -34.0000000 -34.0000000 23 -9261.0000000 -21.0000000 -21.0000000 24 -4913.0000000 -17.0000000 -17.0000000 200000 571787.0000000 83.0000000 83.0000000 400000 -1000.0000000 -10.0000000 -10.0000000 600000 216000.0000000 60.0000000 60.0000000 800000 -42875.0000000 -35.0000000 -35.0000000 1000000 -17576.0000000 -26.0000000 -26.0000000 1200000 804357.0000000 93.0000000 93.0000000 1400000 -10648.0000000 -22.0000000 -22.0000000 1600000 571787.0000000 83.0000000 83.0000000 1800000 -54872.0000000 -38.0000000 -38.0000000 2000000 -85184.0000000 -44.0000000 -44.0000000 C:\Temp>
C:\Temp>ifort /fp:source /O2 /heap-arrays0 p.f90 Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R ) 64, Version 19.0.1.144 Build 20181018 Copyright (C) 1985-2018 Intel Corporation. All rights reserved. Microsoft (R) Incremental Linker Version 14.14.26433.0 Copyright (C) Microsoft Corporation. All rights reserved. -out:p.exe -subsystem:console p.obj C:\Temp>p.exe Cube Root Calculations: Computational Time Run Num Calcs Exp with Newton Corr. CBRT CPU Time (seconds) CPU Time (seconds) per 1E6 Calcs per 1E6 Calcs 1 100000 .347 .000 2 200000 .217 .043 3 300000 .173 .058 4 400000 .152 .065 5 500000 .173 .087 6 600000 .173 .000 7 700000 .136 .099 8 800000 .173 .065 9 900000 .173 .077 10 1000000 .156 .061 11 1100000 .150 .055 12 1200000 .173 .079 13 1300000 .153 .093 14 1400000 .173 .093 15 1500000 .156 .087 16 1600000 .184 .098 17 1700000 .189 .092 18 1800000 .193 .072 19 1900000 .178 .073 20 2000000 .173 .078 Some Results from last run i x y (Newton Corr.) y (CBRT) 1 -1000.0000000 -10.0000000 -10.0000000 2 -8000.0000000 -20.0000000 -20.0000000 6 0.0280244 0.3037472 0.3037472 10 -59319.0000000 -39.0000000 -39.0000000 11 -79507.0000000 -43.0000000 -43.0000000 15 0.0987109 0.4621557 0.4621557 18 0.0132428 0.2365883 0.2365883 21 -39304.0000000 -34.0000000 -34.0000000 23 -9261.0000000 -21.0000000 -21.0000000 24 -4913.0000000 -17.0000000 -17.0000000 200000 571787.0000000 83.0000000 83.0000000 400000 -1000.0000000 -10.0000000 -10.0000000 600000 216000.0000000 60.0000000 60.0000000 800000 -42875.0000000 -35.0000000 -35.0000000 1000000 -17576.0000000 -26.0000000 -26.0000000 1200000 804357.0000000 93.0000000 93.0000000 1400000 -10648.0000000 -22.0000000 -22.0000000 1600000 571787.0000000 83.0000000 83.0000000 1800000 -54872.0000000 -38.0000000 -38.0000000 2000000 -85184.0000000 -44.0000000 -44.0000000 C:\Temp>
In addition to the obvious point as stated upthread a library function for cube root should be a no-brainer for such a specific computational need compared to the more generic exponentiation operation and so any additional "correction" operations will incur further cost, however its impact is relative to one's needs.
But now if Ronald Green or other Intel staff are looking at this, they can try to explain the considerable differences in CPU times with "**" at larger problem sizes, as shown above, with Intel Fortran compiler 19.0 update 1 relative to 18.0 update 3 when /fp:source compiler option is in effect. Both the cases should be using the same power function call with single precision (C_FLOAT), so why is there such a difference?
Hi FortranFan,
Am I wrong to be surprised by the occurrence of CBRT times of .000? What do you see when you write more decimal places?
Cheers, Gib
gib wrote:Hi FortranFan,
Am I wrong to be surprised by the occurrence of CBRT times of .000? What do you see when you write more decimal places?
Cheers, Gib
Hi Gib,
The zero values are due to limitations with CPU_TIME intrinsic, a better way always is to work with tick rates for CPU timing studies using the SYSTEM_CLOCK intrinsic. See a modified example below, there is also one other change - I'd meant to print more values for when abs(x) is small, say <0.1:
program p use, intrinsic :: iso_c_binding, only : wp => c_float use, intrinsic :: iso_fortran_env, only : dp => real64, I8 => int64 implicit none interface function cbrtf( x ) result(r) bind(C, name="cbrtf") use, intrinsic :: iso_c_binding, only : c_float implicit none real(c_float), intent(in), value :: x ! Function result real(c_float) :: r end function cbrtf function cbrtd( x ) result(r) bind(C, name="cbrt") use, intrinsic :: iso_c_binding, only : c_double implicit none real(c_double), intent(in), value :: x ! Function result real(c_double) :: r end function cbrtd end interface interface cbrt procedure cbrtf procedure cbrtd end interface integer i, j, k integer, parameter :: NVALS(*) = [( 100000*i, i=1,20 )] integer, parameter :: NTRIALS = 20 integer :: NumSmallX real(wp), allocatable :: x(:), y1(:), y2(:) real(dp) :: tim1(NTRIALS), tim2(NTRIALS) logical :: ltim(NTRIALS) real(dp) :: AveTim1(size(NVALS)), AveTim2(size(NVALS)) real(dp) :: START_TIME, STOP_TIME print *, "Cube Root Calculations: Computational Time" print "(g0,t10,g0,t26,g0,t51,g0)", "Run", "Num Calcs", "Exp with Newton Corr.", "CBRT" print "(t26,g0,t51,g0)", "CPU Time (seconds)", "CPU Time (seconds)" print "(t26,g0,t51,g0)", "per 1E6 Calcs", "per 1E6 Calcs" Loop_Nvals: do k = 1, size(NVALS) allocate( x(NVALS(k)), y1(NVALS(k)), y2(NVALS(k)) ) Loop_Ntrials: do j = 1, NTRIALS call loadvals( x, expnt=3 ) ! Cube root with '**' operator plus Newton correction blk_root3: block real(wp), parameter :: ONE = 1.0_wp real(wp), parameter :: TWO = 2.0_wp real(wp), parameter :: THREE = 3.0_wp real(wp), parameter :: THIRD = ONE/THREE real(wp), parameter :: TWOTHIRD = TWO/THREE real(wp), parameter :: EPS_X = epsilon(ONE)**THIRD call cpu_t(START_TIME) do i = 1, size(x) y1(i) = sign(abs(x(i))**THIRD, x(i)) !<-- Allow cube root of negative values ! Apply correction using a single step of Newton's method if ( abs(x(i)) > EPS_X ) then ! y1(i) = TWOTHIRD*y1(i) + x(i)/(THREE*y1(i)*y1(i)) end if end do call cpu_t(STOP_TIME) tim1(j) = STOP_TIME - START_TIME end block blk_root3 ! Cube root using C math library function CBRT call cpu_t(START_TIME) do i = 1, size(x) y2(i) = cbrt(x(i)) end do call cpu_t(STOP_TIME) tim2(j) = STOP_TIME - START_TIME end do Loop_Ntrials ltim = .true. ltim(minloc(tim1)) = .false. ; ltim(maxloc(tim1)) = .false. !<-- Throw out best and worst times AveTim1(k) = sum(tim1, mask=ltim)/real(NTRIALS-2, kind=kind(AveTim1))*1E6_dp/ & real(NVALS(k), kind=kind(AveTim1)) ltim = .true. ltim(minloc(tim2)) = .false. ; ltim(maxloc(tim2)) = .false. !<-- Throw out best and worst times AveTim2(k) = sum(tim2, mask=ltim)/real(NTRIALS-2, kind=kind(AveTim2))*1E6_dp/ & real(NVALS(k), kind=kind(AveTim2)) print "(g0,t10,g0,t25,1pf0.3,t50,1pf0.3)", k, NVALS(k), AveTim1(k), AveTim2(k) ! Print some results if ( k == size(NVALS) ) then print *, "Some Results from last run" print "(g0,t22,g0,t37,g0,t57,g0)", "i", "x", "y (Newton Corr.)", "y (CBRT)" NumSmallX = 0 do i = 1, size(x) if ( mod(i,size(x)/10) == 0 ) then print "(i0,t15,f15.7,t30,f15.7,t50,f15.7)", i, x(i), y1(i), y2(i) else if ( abs(x(i)) < 0.1_wp ) then if ( NumSmallX < 10 ) then print "(i0,t15,f15.7,t30,f15.7,t50,f15.7)", i, x(i), y1(i), y2(i) NumSmallX = NumSmallX + 1 end if end if end do end if deallocate( x, y1, y2 ) end do Loop_Nvals stop contains subroutine loadvals( x, expnt ) ! Load values into data array with an arbitrary approach ! Pseudo random number generation plus arbitrary scaling real(wp), intent(inout) :: x(:) integer, intent(in) :: expnt call random_number( x ) where ( x > 0.5_wp ) x = real( int(x*100.0_wp)**expnt, kind=kind(x)) else where ( x > 0.1_wp ) x = -real( int(x*100.0_wp)**expnt, kind=kind(x)) end where return end subroutine subroutine cpu_t( time ) !.. Argument list real(dp), intent(inout) :: time !.. Local variables integer(I8) :: tick integer(I8) :: rate call system_clock (tick, rate) time = real(tick, kind=kind(time) ) / real(rate, kind=kind(time) ) return end subroutine cpu_t end program
Here's an output of an execution with 18.0.3:
C:\Temp>p.exe Cube Root Calculations: Computational Time Run Num Calcs Exp with Newton Corr. CBRT CPU Time (seconds) CPU Time (seconds) per 1E6 Calcs per 1E6 Calcs 1 100000 .061 .072 2 200000 .081 .072 3 300000 .083 .061 4 400000 .081 .061 5 500000 .078 .064 6 600000 .077 .065 7 700000 .081 .065 8 800000 .079 .064 9 900000 .078 .065 10 1000000 .079 .062 11 1100000 .079 .063 12 1200000 .078 .064 13 1300000 .079 .065 14 1400000 .080 .063 15 1500000 .077 .066 16 1600000 .079 .064 17 1700000 .076 .064 18 1800000 .078 .065 19 1900000 .078 .064 20 2000000 .077 .064 Some Results from last run i x y (Newton Corr.) y (CBRT) 6 0.0280244 0.3037472 0.3037472 15 0.0987109 0.4621557 0.4621557 18 0.0132428 0.2365883 0.2365883 26 0.0402600 0.3427345 0.3427345 40 0.0630255 0.3979595 0.3979595 64 0.0813574 0.4333103 0.4333102 68 0.0097841 0.2138815 0.2138815 69 0.0658390 0.4037952 0.4037952 72 0.0442022 0.3535748 0.3535747 75 0.0902116 0.4484914 0.4484914 200000 571787.0000000 83.0000000 83.0000000 400000 -1000.0000000 -10.0000000 -10.0000000 600000 216000.0000000 60.0000000 60.0000000 800000 -42875.0000000 -35.0000000 -35.0000000 1000000 -17576.0000000 -26.0000000 -26.0000000 1200000 804357.0000000 93.0000000 93.0000000 1400000 -10648.0000000 -22.0000000 -22.0000000 1600000 571787.0000000 83.0000000 83.0000000 1800000 -54872.0000000 -38.0000000 -38.0000000 2000000 -85184.0000000 -44.0000000 -44.0000000 C:\Temp>
andrew_4619 wrote:Quote:
gib wrote:
I'm wondering why the OP is not using 8-byte reals.
You are not alone! Whilst there is some interesting discussion in this thread the fact is the compiler is giving good SP results IMO. If this precision is not enough then use DP, simples.
Using DP doesn't resolve the problem, it just pushes to less significant digits. I am not expecting results to 7 or even 6 significant digits. What I would like is that the internal values (which is why I reported hex values for numbers) don't change between compilers and are accurate. These values are propagated within the code at maximum precision and a 1 to 5 bit change can become noticeable after many many sequential calculations. I shouldn't have to use DP to mitigate this type of behavior, it is an inelegant coding technique reminiscent of two to three decades ago and increases the memory footprint and processor time significantly when used universally.
summary - compiler changes implemented less accurate resolution of floating point math which can propagate through long sequential calculations that can be observed in changes in the final resulted presented in decimal form.
Michael D. wrote:What I would like is that the internal values (which is why I reported hex values for numbers) don't change between compilers and are accurate.
In the famous words of Dread Pirate Roberts, "Learn to live with disappointment.". There are many things that can change computed values, including vectorization and even stack alignment. Also order of operations, which is not completely in your control, can yield small differences. See sc13.supercomputing.org/sites/default/files/WorkshopsArchive/pdfs/wp129s1.pdf for some exploration of this.
Michael D. wrote:
Using DP doesn't resolve the problem, it just pushes to less significant digits. I am not expecting results to 7 or even 6 significant digits..........
Using DP clearly DOES resolve the problem because it would appear to give results consistent to the level of accuracy you state you require.
Steve Lionel (Ret.) wrote:.. In the famous words of Dread Pirate Roberts, "Learn to live with disappointment.". There are many things that can change computed values, including vectorization and even stack alignment. Also order of operations, which is not completely in your control, can yield small differences. See sc13.supercomputing.org/sites/default/files/WorkshopsArchive/pdfs/wp129s1.pdf for some exploration of this.
@Michael D.,
You may be familiar with this article from Intel staff: https://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/
You are making a case for yourself as stated in the article, "it is .. useful to have a degree of reproducibility that goes beyond the inherent accuracy of a computation. Some software .. require .. even bit-for-bit, agreement between results before and after software changes, even though the mathematical uncertainty in the result of the computation may be considerably larger. The right compiler options can deliver consistent, closely reproducible results ..."
Are you following all the recommendations in this article? If yes, are you still noticing differences? If so, see Quote #7 where Steve Lionel advises you to submit a support request with Intel (https://supporttickets.intel.com/?lang=en-US). Have you submitted such a request?
Just out of interest, I tried two other compilers; Silverfrost FTN95 (Win32 and x64) and gFortran x64. To report the value, I used:
write (*,fmt='(1x,a,f0.7)') 'the result is ',ans_b
This was because "write (*,*) ans_b" only reported 4 sig figures (11.4471) for FTN95, but 7 sig figures for gFortran (11.4471436)
In all 3 cases with f0.7, I got " the result is 11.4471436"
This is a different approach to rounding up for 32-bit real !
I suspect the only difference between Ver 18 and Ver 19 is in rounding in "write (*,*)"
Given the difference reported in the OP post is due only to different rounding in write (*,*), how does this affect how the program runs ?
To attribute a change only to the 23rd bit, as causing significant problems for the calculation would imply that 32-bit reals are a poor choice.
John Campbell wrote:.. how does this affect how the program runs ?
To attribute a change only to the 23rd bit, as causing significant problems for the calculation would imply that 32-bit reals are a poor choice.
That's not the issue, the problem is one of consistency and reproducibility and what Intel Fortran compiler team intends to offer its users vis-a-vis the programmer expectations and needs.
OP has stated certain needs in this thread that some Intel staff has separately acknowledged, as in:
Programmers of floating-point applications typically have the following objectives: .. • Reproducibility o Produce consistent results: .. - From one compiler to another .. Some software quality assurance tests may require close, or even bit-for-bit, agreement between results before and after software changes, even though the mathematical uncertainty in the result of the computation may be considerably larger. The right compiler options can deliver consistent, closely reproducible results while preserving good (though not optimal) performance. .. Bottom Line .. In the version 17 compiler or later, best reproducibility may be obtained with the single switch /fp:consistent (Windows) .. which sets all of the above options.
Readers will see this with the instruction mentioned in the original post:
C:\Temp>type p.f90 character(len=*), parameter :: fmtg = "(*(g0))" character(len=*), parameter :: fmth = "(g0,z0)" character(len=*), parameter :: fmtb = "(g0,b0)" blk1: block integer, parameter :: WP = selected_real_kind( p=6 ) real(WP), parameter :: THIRD = 1.0_wp/3.0_wp real(WP) :: x, y print fmtg, "blk1: x, y with REAL kind=", kind(x), " and precision=", prec ision(x) x = 1500.0_wp y = x**THIRD print fmtg, "x = ", x print fmtg, "y = x**THIRD = ", y print fmth, "y (HEX) = ", y print fmtb, "y (BINARY) = ", y end block blk1 print * blk2: block integer, parameter :: WP = selected_real_kind( p=12 ) real(WP), parameter :: THIRD = 1.0_wp/3.0_wp real(WP) :: x, y print fmtg, "blk2: x, y with REAL kind=", kind(x), " and precision=", prec ision(x) x = 1500.0_wp y = x**THIRD print fmtg, "x = ", x print fmtg, "y = x**THIRD = ", y print fmth, "y (HEX) = ", y print fmtb, "y (BINARY) = ", y end block blk2 end C:\Temp>ifort /fp:consistent p.f90 Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R ) 64, Version 18.0.3.210 Build 20180410 Copyright (C) 1985-2018 Intel Corporation. All rights reserved. Microsoft (R) Incremental Linker Version 14.14.26433.0 Copyright (C) Microsoft Corporation. All rights reserved. -out:p.exe -subsystem:console p.obj C:\Temp>p.exe blk1: x, y with REAL kind=4 and precision=6 x = 1500.000 y = x**THIRD = 11.44714 y (HEX) = 4137277F y (BINARY) = 1000001001101110010011101111111 blk2: x, y with REAL kind=8 and precision=15 x = 1500.000000000000 y = x**THIRD = 11.44714242553332 y (HEX) = 4026E4EFDA1CA3B2 y (BINARY) = 100000000100110111001001110111111011010000111001010001110110010 C:\Temp>
whereas with Intel Fortran 19.0.1:
C:\Temp>ifort /fp:consistent p.f90 Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R ) 64, Version 19.0.0.117 Build 20180804 Copyright (C) 1985-2018 Intel Corporation. All rights reserved. Microsoft (R) Incremental Linker Version 14.14.26433.0 Copyright (C) Microsoft Corporation. All rights reserved. -out:p.exe -subsystem:console p.obj C:\Temp>p.exe blk1: x, y with REAL kind=4 and precision=6 x = 1500.000 y = x**THIRD = 11.44714 y (HEX) = 41372780 y (BINARY) = 1000001001101110010011110000000 blk2: x, y with REAL kind=8 and precision=15 x = 1500.000000000000 y = x**THIRD = 11.44714242553332 y (HEX) = 4026E4EFDA1CA3B1 y (BINARY) = 100000000100110111001001110111111011010000111001010001110110001 C:\Temp>
OP finds the above shown inconsistency in block blk1 (y=z'41372780' vs y=z'4137277F') between the two Intel Fortran versions unacceptable: the matter of interest to me is what support Intel Fortran team will provide to customers in such a situation!
FF, Great presentation. The only thing I can add is:
The one bit difference, accumulated in same magnitude numbers (one example is to add to self), after 1 million iterations will differ by 20 bits from the lsb. IOW the results of a simulation could dieviate significantly. This in no way indicates as which is more correct, rather the results differ. Different results can mean either minor change (e.g. single bit difference) in operation .OR. something wrong with compilation (bug). When results of an operation differ, then it becomes problematic to determine if the difference is due to a bug or not. When the application consists of 100's or 1000's of source files, this places a tremendous burden on the certification of the new compiler.
Jim Dempsey
For more complete information about compiler optimizations, see our Optimization Notice.