- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just upgrade to to compiler version 19.0.0.117 from 18.0.3.210. In reviewing some of my codes I noticed small discrepancies in the results. Tracing these discrepancies led to the evaluation of the cube root function. In version 18, cube root (exponentiation to 1/3) is handled by cbrtf. In version 19 this changes to powf. This appears to be a reversal of what happened between versions 11 and 12 (see: https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/279705). This simple code snippet can be used to observe the change:
real b real ans_b b= 1500. ans_b= b**(1./3.)
In version 18 the result is 11.4471426
In version 19 the result is 11.4471435
Any reason for this change? Especially since the version 18 implementation seems to give the more accurate answer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ronald W Green (Intel) wrote:As a side note, with the 19 compiler, there is no way to force the compiler to use cbrtf() IF you specify -fp-model source. You asked for it, you got it (power function).
But what if you asked for cbrtf() instead of powf()? Does this still work in 19?
module M
use ISO_C_BINDING
implicit none
interface
function cbrtf(x) bind(C,name='cbrtf')
import
implicit none
real(C_FLOAT) cbrtf
real(C_FLOAT), value :: x
end function cbrtf
end interface
end module M
program P
use M
implicit none
real x, y
x = 1500
y = cbrtf(x)
write(*,'(f10.7)') y
end program P
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm wondering why the OP is not using 8-byte reals.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
gib wrote:
I'm wondering why the OP is not using 8-byte reals.
You are not alone! Whilst there is some interesting discussion in this thread the fact is the compiler is giving good SP results IMO. If this precision is not enough then use DP, simples.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nichols, John wrote:
..Program took: 0.000000E+00 seconds. Press any key to continue . . .
On modern computers, such single calculation checks can hardly reveal anything useful. Now if one is looking at different solvers for particular calculations/simulations, a systematic approach toward computational analyses as outlined in almost any good book on numerical methods and/or algorithms will be necessary. But when one is considering as basic as options for a cube root calculation, there are few performant alternatives to CBRT the C library function, if any at all. However any modifications/tweaks with the POWF (as with Fortran's '**' operator) are going to result in a few additional floating point operations but they are hardly going to be noticeable unless they are consumed in highly compute intensive circumstances in which case the single calculation check in Quote #20 will not help, one could consider more thorough profiling using tools geared for such analysis or at the very least "brute force" the analysis by "instrumenting" a bunch of code to perform various "runs" of differing problem sizes where each run is combined with a number of trials and some form of a statistical "median" in the measured computational times is captured for a comparative study. See an example below which shows all the trivial but time-consuming wiring one has to do even for a simple-minded CPU timing study:
program p
use, intrinsic :: iso_c_binding, only : wp => c_float
use, intrinsic :: iso_fortran_env, only : dp => real64
implicit none
interface
function cbrtf( x ) result(r) bind(C, name="cbrtf")
use, intrinsic :: iso_c_binding, only : c_float
implicit none
real(c_float), intent(in), value :: x
! Function result
real(c_float) :: r
end function cbrtf
function cbrtd( x ) result(r) bind(C, name="cbrt")
use, intrinsic :: iso_c_binding, only : c_double
implicit none
real(c_double), intent(in), value :: x
! Function result
real(c_double) :: r
end function cbrtd
end interface
interface cbrt
procedure cbrtf
procedure cbrtd
end interface
integer i, j, k
integer, parameter :: NVALS(*) = [( 100000*i, i=1,20 )]
integer, parameter :: NTRIALS = 20
integer :: NumSmallX
real(wp), allocatable :: x(:), y1(:), y2(:)
real(dp) :: tim1(NTRIALS), tim2(NTRIALS)
logical :: ltim(NTRIALS)
real(dp) :: AveTim1(size(NVALS)), AveTim2(size(NVALS))
real(dp) :: START_TIME, STOP_TIME
print *, "Cube Root Calculations: Computational Time"
print "(g0,t10,g0,t26,g0,t51,g0)", "Run", "Num Calcs", "Exp with Newton Corr.", "CBRT"
print "(t26,g0,t51,g0)", "CPU Time (seconds)", "CPU Time (seconds)"
print "(t26,g0,t51,g0)", "per 1E6 Calcs", "per 1E6 Calcs"
Loop_Nvals: do k = 1, size(NVALS)
allocate( x(NVALS(k)), y1(NVALS(k)), y2(NVALS(k)) )
Loop_Ntrials: do j = 1, NTRIALS
call loadvals( x, expnt=3 )
! Cube root with '**' operator plus Newton correction
blk_root3: block
real(wp), parameter :: ONE = 1.0_wp
real(wp), parameter :: TWO = 2.0_wp
real(wp), parameter :: THREE = 3.0_wp
real(wp), parameter :: THIRD = ONE/THREE
real(wp), parameter :: TWOTHIRD = TWO/THREE
real(wp), parameter :: EPS_X = epsilon(ONE)**THIRD
call cpu_time(START_TIME)
do i = 1, size(x)
y1(i) = sign(abs(x(i))**THIRD, x(i)) !<-- Allow cube root of negative values
! Apply correction using a single step of Newton's method
if ( abs(x(i)) > EPS_X ) then !
y1(i) = TWOTHIRD*y1(i) + x(i)/(THREE*y1(i)*y1(i))
end if
end do
call cpu_time(STOP_TIME)
tim1(j) = STOP_TIME - START_TIME
end block blk_root3
! Cube root using C math library function CBRT
call cpu_time(START_TIME)
do i = 1, size(x)
y2(i) = cbrt(x(i))
end do
call cpu_time(STOP_TIME)
tim2(j) = STOP_TIME - START_TIME
end do Loop_Ntrials
ltim = .true.
ltim(minloc(tim1)) = .false. ; ltim(maxloc(tim1)) = .false. !<-- Throw out best and worst times
AveTim1(k) = sum(tim1, mask=ltim)/real(NTRIALS-2, kind=kind(AveTim1))*1E6_dp/ &
real(NVALS(k), kind=kind(AveTim1))
ltim = .true.
ltim(minloc(tim2)) = .false. ; ltim(maxloc(tim2)) = .false. !<-- Throw out best and worst times
AveTim2(k) = sum(tim2, mask=ltim)/real(NTRIALS-2, kind=kind(AveTim2))*1E6_dp/ &
real(NVALS(k), kind=kind(AveTim2))
print "(g0,t10,g0,t25,1pf0.3,t50,1pf0.3)", k, NVALS(k), AveTim1(k), AveTim2(k)
! Print some results
if ( k == size(NVALS) ) then
print *, "Some Results from last run"
print "(g0,t22,g0,t37,g0,t57,g0)", "i", "x", "y (Newton Corr.)", "y (CBRT)"
NumSmallX = 0
do i = 1, size(x)
if ( mod(i,size(x)/10) == 0 ) then
print "(i0,t15,f15.7,t30,f15.7,t50,f15.7)", i, x(i), y1(i), y2(i)
else if ( x(i) < 0.1_wp ) then
if ( NumSmallX < 10 ) then
print "(i0,t15,f15.7,t30,f15.7,t50,f15.7)", i, x(i), y1(i), y2(i)
NumSmallX = NumSmallX + 1
end if
end if
end do
end if
deallocate( x, y1, y2 )
end do Loop_Nvals
stop
contains
subroutine loadvals( x, expnt )
! Load values into data array with an arbitrary approach
! Pseudo random number generation plus arbitrary scaling
real(wp), intent(inout) :: x(:)
integer, intent(in) :: expnt
call random_number( x )
where ( x > 0.5_wp )
x = real( int(x*100.0_wp)**expnt, kind=kind(x))
else where ( x > 0.1_wp )
x = -real( int(x*100.0_wp)**expnt, kind=kind(x))
end where
return
end subroutine
end program
Now when this code is run with the 2 versions of Intel Fortran compilers mentioned by OP, the output is:
C:\Temp>ifort /fp:source /O2 /heap-arrays0 p.f90
Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R
) 64, Version 18.0.3.210 Build 20180410
Copyright (C) 1985-2018 Intel Corporation. All rights reserved.
Microsoft (R) Incremental Linker Version 14.14.26433.0
Copyright (C) Microsoft Corporation. All rights reserved.
-out:p.exe
-subsystem:console
p.obj
C:\Temp>p.exe
Cube Root Calculations: Computational Time
Run Num Calcs Exp with Newton Corr. CBRT
CPU Time (seconds) CPU Time (seconds)
per 1E6 Calcs per 1E6 Calcs
1 100000 .087 .000
2 200000 .087 .000
3 300000 .058 .029
4 400000 .065 .022
5 500000 .087 .035
6 600000 .043 .058
7 700000 .111 .000
8 800000 .076 .054
9 900000 .096 .000
10 1000000 .061 .061
11 1100000 .079 .063
12 1200000 .079 .058
13 1300000 .060 .080
14 1400000 .068 .062
15 1500000 .075 .064
16 1600000 .087 .060
17 1700000 .087 .061
18 1800000 .072 .043
19 1900000 .078 .068
20 2000000 .078 .065
Some Results from last run
i x y (Newton Corr.) y (CBRT)
1 -1000.0000000 -10.0000000 -10.0000000
2 -8000.0000000 -20.0000000 -20.0000000
6 0.0280244 0.3037472 0.3037472
10 -59319.0000000 -39.0000000 -39.0000000
11 -79507.0000000 -43.0000000 -43.0000000
15 0.0987109 0.4621557 0.4621557
18 0.0132428 0.2365883 0.2365883
21 -39304.0000000 -34.0000000 -34.0000000
23 -9261.0000000 -21.0000000 -21.0000000
24 -4913.0000000 -17.0000000 -17.0000000
200000 571787.0000000 83.0000000 83.0000000
400000 -1000.0000000 -10.0000000 -10.0000000
600000 216000.0000000 60.0000000 60.0000000
800000 -42875.0000000 -35.0000000 -35.0000000
1000000 -17576.0000000 -26.0000000 -26.0000000
1200000 804357.0000000 93.0000000 93.0000000
1400000 -10648.0000000 -22.0000000 -22.0000000
1600000 571787.0000000 83.0000000 83.0000000
1800000 -54872.0000000 -38.0000000 -38.0000000
2000000 -85184.0000000 -44.0000000 -44.0000000
C:\Temp>
C:\Temp>ifort /fp:source /O2 /heap-arrays0 p.f90
Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R
) 64, Version 19.0.1.144 Build 20181018
Copyright (C) 1985-2018 Intel Corporation. All rights reserved.
Microsoft (R) Incremental Linker Version 14.14.26433.0
Copyright (C) Microsoft Corporation. All rights reserved.
-out:p.exe
-subsystem:console
p.obj
C:\Temp>p.exe
Cube Root Calculations: Computational Time
Run Num Calcs Exp with Newton Corr. CBRT
CPU Time (seconds) CPU Time (seconds)
per 1E6 Calcs per 1E6 Calcs
1 100000 .347 .000
2 200000 .217 .043
3 300000 .173 .058
4 400000 .152 .065
5 500000 .173 .087
6 600000 .173 .000
7 700000 .136 .099
8 800000 .173 .065
9 900000 .173 .077
10 1000000 .156 .061
11 1100000 .150 .055
12 1200000 .173 .079
13 1300000 .153 .093
14 1400000 .173 .093
15 1500000 .156 .087
16 1600000 .184 .098
17 1700000 .189 .092
18 1800000 .193 .072
19 1900000 .178 .073
20 2000000 .173 .078
Some Results from last run
i x y (Newton Corr.) y (CBRT)
1 -1000.0000000 -10.0000000 -10.0000000
2 -8000.0000000 -20.0000000 -20.0000000
6 0.0280244 0.3037472 0.3037472
10 -59319.0000000 -39.0000000 -39.0000000
11 -79507.0000000 -43.0000000 -43.0000000
15 0.0987109 0.4621557 0.4621557
18 0.0132428 0.2365883 0.2365883
21 -39304.0000000 -34.0000000 -34.0000000
23 -9261.0000000 -21.0000000 -21.0000000
24 -4913.0000000 -17.0000000 -17.0000000
200000 571787.0000000 83.0000000 83.0000000
400000 -1000.0000000 -10.0000000 -10.0000000
600000 216000.0000000 60.0000000 60.0000000
800000 -42875.0000000 -35.0000000 -35.0000000
1000000 -17576.0000000 -26.0000000 -26.0000000
1200000 804357.0000000 93.0000000 93.0000000
1400000 -10648.0000000 -22.0000000 -22.0000000
1600000 571787.0000000 83.0000000 83.0000000
1800000 -54872.0000000 -38.0000000 -38.0000000
2000000 -85184.0000000 -44.0000000 -44.0000000
C:\Temp>
In addition to the obvious point as stated upthread a library function for cube root should be a no-brainer for such a specific computational need compared to the more generic exponentiation operation and so any additional "correction" operations will incur further cost, however its impact is relative to one's needs.
But now if Ronald Green or other Intel staff are looking at this, they can try to explain the considerable differences in CPU times with "**" at larger problem sizes, as shown above, with Intel Fortran compiler 19.0 update 1 relative to 18.0 update 3 when /fp:source compiler option is in effect. Both the cases should be using the same power function call with single precision (C_FLOAT), so why is there such a difference?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi FortranFan,
Am I wrong to be surprised by the occurrence of CBRT times of .000? What do you see when you write more decimal places?
Cheers, Gib
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
gib wrote:Hi FortranFan,
Am I wrong to be surprised by the occurrence of CBRT times of .000? What do you see when you write more decimal places?
Cheers, Gib
Hi Gib,
The zero values are due to limitations with CPU_TIME intrinsic, a better way always is to work with tick rates for CPU timing studies using the SYSTEM_CLOCK intrinsic. See a modified example below, there is also one other change - I'd meant to print more values for when abs(x) is small, say <0.1:
program p
use, intrinsic :: iso_c_binding, only : wp => c_float
use, intrinsic :: iso_fortran_env, only : dp => real64, I8 => int64
implicit none
interface
function cbrtf( x ) result(r) bind(C, name="cbrtf")
use, intrinsic :: iso_c_binding, only : c_float
implicit none
real(c_float), intent(in), value :: x
! Function result
real(c_float) :: r
end function cbrtf
function cbrtd( x ) result(r) bind(C, name="cbrt")
use, intrinsic :: iso_c_binding, only : c_double
implicit none
real(c_double), intent(in), value :: x
! Function result
real(c_double) :: r
end function cbrtd
end interface
interface cbrt
procedure cbrtf
procedure cbrtd
end interface
integer i, j, k
integer, parameter :: NVALS(*) = [( 100000*i, i=1,20 )]
integer, parameter :: NTRIALS = 20
integer :: NumSmallX
real(wp), allocatable :: x(:), y1(:), y2(:)
real(dp) :: tim1(NTRIALS), tim2(NTRIALS)
logical :: ltim(NTRIALS)
real(dp) :: AveTim1(size(NVALS)), AveTim2(size(NVALS))
real(dp) :: START_TIME, STOP_TIME
print *, "Cube Root Calculations: Computational Time"
print "(g0,t10,g0,t26,g0,t51,g0)", "Run", "Num Calcs", "Exp with Newton Corr.", "CBRT"
print "(t26,g0,t51,g0)", "CPU Time (seconds)", "CPU Time (seconds)"
print "(t26,g0,t51,g0)", "per 1E6 Calcs", "per 1E6 Calcs"
Loop_Nvals: do k = 1, size(NVALS)
allocate( x(NVALS(k)), y1(NVALS(k)), y2(NVALS(k)) )
Loop_Ntrials: do j = 1, NTRIALS
call loadvals( x, expnt=3 )
! Cube root with '**' operator plus Newton correction
blk_root3: block
real(wp), parameter :: ONE = 1.0_wp
real(wp), parameter :: TWO = 2.0_wp
real(wp), parameter :: THREE = 3.0_wp
real(wp), parameter :: THIRD = ONE/THREE
real(wp), parameter :: TWOTHIRD = TWO/THREE
real(wp), parameter :: EPS_X = epsilon(ONE)**THIRD
call cpu_t(START_TIME)
do i = 1, size(x)
y1(i) = sign(abs(x(i))**THIRD, x(i)) !<-- Allow cube root of negative values
! Apply correction using a single step of Newton's method
if ( abs(x(i)) > EPS_X ) then !
y1(i) = TWOTHIRD*y1(i) + x(i)/(THREE*y1(i)*y1(i))
end if
end do
call cpu_t(STOP_TIME)
tim1(j) = STOP_TIME - START_TIME
end block blk_root3
! Cube root using C math library function CBRT
call cpu_t(START_TIME)
do i = 1, size(x)
y2(i) = cbrt(x(i))
end do
call cpu_t(STOP_TIME)
tim2(j) = STOP_TIME - START_TIME
end do Loop_Ntrials
ltim = .true.
ltim(minloc(tim1)) = .false. ; ltim(maxloc(tim1)) = .false. !<-- Throw out best and worst times
AveTim1(k) = sum(tim1, mask=ltim)/real(NTRIALS-2, kind=kind(AveTim1))*1E6_dp/ &
real(NVALS(k), kind=kind(AveTim1))
ltim = .true.
ltim(minloc(tim2)) = .false. ; ltim(maxloc(tim2)) = .false. !<-- Throw out best and worst times
AveTim2(k) = sum(tim2, mask=ltim)/real(NTRIALS-2, kind=kind(AveTim2))*1E6_dp/ &
real(NVALS(k), kind=kind(AveTim2))
print "(g0,t10,g0,t25,1pf0.3,t50,1pf0.3)", k, NVALS(k), AveTim1(k), AveTim2(k)
! Print some results
if ( k == size(NVALS) ) then
print *, "Some Results from last run"
print "(g0,t22,g0,t37,g0,t57,g0)", "i", "x", "y (Newton Corr.)", "y (CBRT)"
NumSmallX = 0
do i = 1, size(x)
if ( mod(i,size(x)/10) == 0 ) then
print "(i0,t15,f15.7,t30,f15.7,t50,f15.7)", i, x(i), y1(i), y2(i)
else if ( abs(x(i)) < 0.1_wp ) then
if ( NumSmallX < 10 ) then
print "(i0,t15,f15.7,t30,f15.7,t50,f15.7)", i, x(i), y1(i), y2(i)
NumSmallX = NumSmallX + 1
end if
end if
end do
end if
deallocate( x, y1, y2 )
end do Loop_Nvals
stop
contains
subroutine loadvals( x, expnt )
! Load values into data array with an arbitrary approach
! Pseudo random number generation plus arbitrary scaling
real(wp), intent(inout) :: x(:)
integer, intent(in) :: expnt
call random_number( x )
where ( x > 0.5_wp )
x = real( int(x*100.0_wp)**expnt, kind=kind(x))
else where ( x > 0.1_wp )
x = -real( int(x*100.0_wp)**expnt, kind=kind(x))
end where
return
end subroutine
subroutine cpu_t( time )
!.. Argument list
real(dp), intent(inout) :: time
!.. Local variables
integer(I8) :: tick
integer(I8) :: rate
call system_clock (tick, rate)
time = real(tick, kind=kind(time) ) / real(rate, kind=kind(time) )
return
end subroutine cpu_t
end program
Here's an output of an execution with 18.0.3:
C:\Temp>p.exe
Cube Root Calculations: Computational Time
Run Num Calcs Exp with Newton Corr. CBRT
CPU Time (seconds) CPU Time (seconds)
per 1E6 Calcs per 1E6 Calcs
1 100000 .061 .072
2 200000 .081 .072
3 300000 .083 .061
4 400000 .081 .061
5 500000 .078 .064
6 600000 .077 .065
7 700000 .081 .065
8 800000 .079 .064
9 900000 .078 .065
10 1000000 .079 .062
11 1100000 .079 .063
12 1200000 .078 .064
13 1300000 .079 .065
14 1400000 .080 .063
15 1500000 .077 .066
16 1600000 .079 .064
17 1700000 .076 .064
18 1800000 .078 .065
19 1900000 .078 .064
20 2000000 .077 .064
Some Results from last run
i x y (Newton Corr.) y (CBRT)
6 0.0280244 0.3037472 0.3037472
15 0.0987109 0.4621557 0.4621557
18 0.0132428 0.2365883 0.2365883
26 0.0402600 0.3427345 0.3427345
40 0.0630255 0.3979595 0.3979595
64 0.0813574 0.4333103 0.4333102
68 0.0097841 0.2138815 0.2138815
69 0.0658390 0.4037952 0.4037952
72 0.0442022 0.3535748 0.3535747
75 0.0902116 0.4484914 0.4484914
200000 571787.0000000 83.0000000 83.0000000
400000 -1000.0000000 -10.0000000 -10.0000000
600000 216000.0000000 60.0000000 60.0000000
800000 -42875.0000000 -35.0000000 -35.0000000
1000000 -17576.0000000 -26.0000000 -26.0000000
1200000 804357.0000000 93.0000000 93.0000000
1400000 -10648.0000000 -22.0000000 -22.0000000
1600000 571787.0000000 83.0000000 83.0000000
1800000 -54872.0000000 -38.0000000 -38.0000000
2000000 -85184.0000000 -44.0000000 -44.0000000
C:\Temp>
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
andrew_4619 wrote:Quote:
gib wrote:
I'm wondering why the OP is not using 8-byte reals.
You are not alone! Whilst there is some interesting discussion in this thread the fact is the compiler is giving good SP results IMO. If this precision is not enough then use DP, simples.
Using DP doesn't resolve the problem, it just pushes to less significant digits. I am not expecting results to 7 or even 6 significant digits. What I would like is that the internal values (which is why I reported hex values for numbers) don't change between compilers and are accurate. These values are propagated within the code at maximum precision and a 1 to 5 bit change can become noticeable after many many sequential calculations. I shouldn't have to use DP to mitigate this type of behavior, it is an inelegant coding technique reminiscent of two to three decades ago and increases the memory footprint and processor time significantly when used universally.
summary - compiler changes implemented less accurate resolution of floating point math which can propagate through long sequential calculations that can be observed in changes in the final resulted presented in decimal form.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Michael D. wrote:What I would like is that the internal values (which is why I reported hex values for numbers) don't change between compilers and are accurate.
In the famous words of Dread Pirate Roberts, "Learn to live with disappointment.". There are many things that can change computed values, including vectorization and even stack alignment. Also order of operations, which is not completely in your control, can yield small differences. See sc13.supercomputing.org/sites/default/files/WorkshopsArchive/pdfs/wp129s1.pdf for some exploration of this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Michael D. wrote:
Using DP doesn't resolve the problem, it just pushes to less significant digits. I am not expecting results to 7 or even 6 significant digits..........
Using DP clearly DOES resolve the problem because it would appear to give results consistent to the level of accuracy you state you require.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve Lionel (Ret.) wrote:.. In the famous words of Dread Pirate Roberts, "Learn to live with disappointment.". There are many things that can change computed values, including vectorization and even stack alignment. Also order of operations, which is not completely in your control, can yield small differences. See sc13.supercomputing.org/sites/default/files/WorkshopsArchive/pdfs/wp129s1.pdf for some exploration of this.
@Michael D.,
You may be familiar with this article from Intel staff: https://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/
You are making a case for yourself as stated in the article, "it is .. useful to have a degree of reproducibility that goes beyond the inherent accuracy of a computation. Some software .. require .. even bit-for-bit, agreement between results before and after software changes, even though the mathematical uncertainty in the result of the computation may be considerably larger. The right compiler options can deliver consistent, closely reproducible results ..."
Are you following all the recommendations in this article? If yes, are you still noticing differences? If so, see Quote #7 where Steve Lionel advises you to submit a support request with Intel (https://supporttickets.intel.com/?lang=en-US). Have you submitted such a request?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just out of interest, I tried two other compilers; Silverfrost FTN95 (Win32 and x64) and gFortran x64. To report the value, I used:
write (*,fmt='(1x,a,f0.7)') 'the result is ',ans_b
This was because "write (*,*) ans_b" only reported 4 sig figures (11.4471) for FTN95, but 7 sig figures for gFortran (11.4471436)
In all 3 cases with f0.7, I got " the result is 11.4471436"
This is a different approach to rounding up for 32-bit real !
I suspect the only difference between Ver 18 and Ver 19 is in rounding in "write (*,*)"
Given the difference reported in the OP post is due only to different rounding in write (*,*), how does this affect how the program runs ?
To attribute a change only to the 23rd bit, as causing significant problems for the calculation would imply that 32-bit reals are a poor choice.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
John Campbell wrote:.. how does this affect how the program runs ?
To attribute a change only to the 23rd bit, as causing significant problems for the calculation would imply that 32-bit reals are a poor choice.
That's not the issue, the problem is one of consistency and reproducibility and what Intel Fortran compiler team intends to offer its users vis-a-vis the programmer expectations and needs.
OP has stated certain needs in this thread that some Intel staff has separately acknowledged, as in:
Programmers of floating-point applications typically have the following objectives: .. • Reproducibility o Produce consistent results: .. - From one compiler to another .. Some software quality assurance tests may require close, or even bit-for-bit, agreement between results before and after software changes, even though the mathematical uncertainty in the result of the computation may be considerably larger. The right compiler options can deliver consistent, closely reproducible results while preserving good (though not optimal) performance. .. Bottom Line .. In the version 17 compiler or later, best reproducibility may be obtained with the single switch /fp:consistent (Windows) .. which sets all of the above options.
Readers will see this with the instruction mentioned in the original post:
C:\Temp>type p.f90
character(len=*), parameter :: fmtg = "(*(g0))"
character(len=*), parameter :: fmth = "(g0,z0)"
character(len=*), parameter :: fmtb = "(g0,b0)"
blk1: block
integer, parameter :: WP = selected_real_kind( p=6 )
real(WP), parameter :: THIRD = 1.0_wp/3.0_wp
real(WP) :: x, y
print fmtg, "blk1: x, y with REAL kind=", kind(x), " and precision=", prec
ision(x)
x = 1500.0_wp
y = x**THIRD
print fmtg, "x = ", x
print fmtg, "y = x**THIRD = ", y
print fmth, "y (HEX) = ", y
print fmtb, "y (BINARY) = ", y
end block blk1
print *
blk2: block
integer, parameter :: WP = selected_real_kind( p=12 )
real(WP), parameter :: THIRD = 1.0_wp/3.0_wp
real(WP) :: x, y
print fmtg, "blk2: x, y with REAL kind=", kind(x), " and precision=", prec
ision(x)
x = 1500.0_wp
y = x**THIRD
print fmtg, "x = ", x
print fmtg, "y = x**THIRD = ", y
print fmth, "y (HEX) = ", y
print fmtb, "y (BINARY) = ", y
end block blk2
end
C:\Temp>ifort /fp:consistent p.f90
Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R
) 64, Version 18.0.3.210 Build 20180410
Copyright (C) 1985-2018 Intel Corporation. All rights reserved.
Microsoft (R) Incremental Linker Version 14.14.26433.0
Copyright (C) Microsoft Corporation. All rights reserved.
-out:p.exe
-subsystem:console
p.obj
C:\Temp>p.exe
blk1: x, y with REAL kind=4 and precision=6
x = 1500.000
y = x**THIRD = 11.44714
y (HEX) = 4137277F
y (BINARY) = 1000001001101110010011101111111
blk2: x, y with REAL kind=8 and precision=15
x = 1500.000000000000
y = x**THIRD = 11.44714242553332
y (HEX) = 4026E4EFDA1CA3B2
y (BINARY) = 100000000100110111001001110111111011010000111001010001110110010
C:\Temp>
whereas with Intel Fortran 19.0.1:
C:\Temp>ifort /fp:consistent p.f90 Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R ) 64, Version 19.0.0.117 Build 20180804 Copyright (C) 1985-2018 Intel Corporation. All rights reserved. Microsoft (R) Incremental Linker Version 14.14.26433.0 Copyright (C) Microsoft Corporation. All rights reserved. -out:p.exe -subsystem:console p.obj C:\Temp>p.exe blk1: x, y with REAL kind=4 and precision=6 x = 1500.000 y = x**THIRD = 11.44714 y (HEX) = 41372780 y (BINARY) = 1000001001101110010011110000000 blk2: x, y with REAL kind=8 and precision=15 x = 1500.000000000000 y = x**THIRD = 11.44714242553332 y (HEX) = 4026E4EFDA1CA3B1 y (BINARY) = 100000000100110111001001110111111011010000111001010001110110001 C:\Temp>
OP finds the above shown inconsistency in block blk1 (y=z'41372780' vs y=z'4137277F') between the two Intel Fortran versions unacceptable: the matter of interest to me is what support Intel Fortran team will provide to customers in such a situation!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FF, Great presentation. The only thing I can add is:
The one bit difference, accumulated in same magnitude numbers (one example is to add to self), after 1 million iterations will differ by 20 bits from the lsb. IOW the results of a simulation could dieviate significantly. This in no way indicates as which is more correct, rather the results differ. Different results can mean either minor change (e.g. single bit difference) in operation .OR. something wrong with compilation (bug). When results of an operation differ, then it becomes problematic to determine if the difference is due to a bug or not. When the application consists of 100's or 1000's of source files, this places a tremendous burden on the certification of the new compiler.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »