float point accuracy issue (/fp:fast) or code issue?

a_zhaogtisoft_com · ‎12-13-2013

I have a very simple piece of code:

C=================================================================================================================================
      function CF_ROUGH(eps,diam,Re)
C=================================================================================================================================
! Calculates friction coefficient as a function of roughness, pipe diameter and Reynolds number
C---------------------------------------------------------------------------------------------------------------------------------
      implicit none
      real, intent(in) :: eps, diam, Re ! roughness, diameter, Reynolds number
      real :: CF_ROUGH, temp1, A, B
C---------------------------------------------------------------------------------------------------------------------------------
      temp1 = eps/diam/3.7
      A = -2.*log10(temp1+12./Re)
      B = -2.*log10(temp1+2.51*A/Re)
      CF_ROUGH = 0.25/( 4.781 - (A-4.781)**2/(B-2.*A+4.781) )**2

      end function CF_ROUGH

It get called 8 million+ times throughout our simulation, where eps = 0.0, diam = 5.0E-3 - 16.0E-3, Rd = 4000 - 110000.

Here is my trouble: first of all, the debug build (/Od) and release build (/O2) give different result, and more interestingly, the 32-bit release run and 64-bit release run take vastly different path after 8-million calls of this function. However, the 32-bit and 64-bit debug build will yield exactly the same result! At first glance, it looks like a bug due to compiler optimization.

I could fix this by follow strick:

1) disable the optimization for this function only:

if I use

!DIR$ NOOPTIMIZE

to disable the optimization of this function only, then both 32/64 release build will yield the same result as debug build.

2) use /fp:precision for this source file only

This will also make the release run yield the same result as that of a debug build.

Here is my question:

1) Do you see any code issue?

2) Is there any way to rearrange the code to avoid the compiler optimization caused issue?

3) If there is nothing we can do with the coding, what is the best practice to 'fix' it? "!DIR$ NOOPTIMIZE" or /fp:precise (then we will be hit by performance downgrade), or something else?

Any suggestion?

TimP · ‎12-13-2013

I'm guessing that you are most likely to see differences due to permitting inter-procedural optimization with this function. /Qip- /Qipo- (or equivalents in the project settings) would avoid this.

ifort requires the option /assume:protect_parens (or options implying it, such as /fp:precise) in order to be standards compliant.

Fortran standard (since 1966!) considers eps/diam/3.7 equivalent to eps/(diam*3.7) so you should write it the latter way if you don't want accidental variations. /Qprec-div (implied by /fp:precise) would prevent the latter being replaced by 1/(diam*3.7)*eps in case that matters.

If your results vary with compiler optimizations implying changes in the 6th or 7th decimal place, when your algorithm is implied accurate only to 2 to 4 decimal places, it looks like a possible algorithmic problem. Another possibility is that you have a problem you haven't shown, which happens to appear to be associated with what you have told us.

mecej4 · ‎12-13-2013

If there are any optimization errors, they are probably related to code other than the portion that you showed. Here is my test code, which gives identical results when compiled with Intel-32bit and Intel-64 bit compilers, and CVF 6.6C, with or without optimization requested. The code covers the range e/D = 0.001 to 0.01 and Re = 5,000 to 3,000,000, and shows that the new approximation formula for friction factor agrees with the old Colebrook-White expression, differing from it by no more than 0.04 percent.

[fortran]

program testf
real e,Re
integer i,j
write(*,*)'eps/d Re f f-CW %diff'
do i=1,10
write(*,*)
e=0.001*i
Re=1000
do j=1,5
Re=Re*5
fr=f(e,Re)
rrtf=-2*log10(e/3.7+2.51/(Re*sqrt(fr)))
fr1=1/(rrtf*rrtf)
write(*,10)e,Re,fr,fr1,(fr/fr1-1)*100
end do
end do
10 format(F6.3,F9.0,2F10.6,4x,f7.3)
end program testf

real function f(e,Re)
real e,Re,S1,S2,A
S1 = -2*log10(e/3.7+12/Re)
S2 = -2*log10(e/3.7+2.51/Re*S1)
A = 4.781-(S1-4.781)**2/(S2-2*S1+4.781)
f = 1/(A*A)
return
end

[/fortran]

a_zhaogtisoft_com · ‎12-16-2013

Tim,

We do not use IPO at all (I have a lot of issue with using it, anytime I tried it out with out project for the last 8 years - a fact I never understand it - either the results are all wrong, or simply crash, btw, IPO linking is taking forever).

We already have the /assume:protect_parens for all of project. A complete compiler option line is as following:

/nologo /O2 /assume:buffered_io /fpp /assume:nosource_include /extend_source:132 /Qvec-report0 /real_size:64 /align:rec16byte /align:qcommons /align:sequence /fpconstant /names:lowercase /iface:cref /assume:underscore /libs:static /threads /Qmkl:parallel /c /assume:protect_parens

Adding /Qprec-div does not help either.

Several other observation

1) order of writing code: (In our application, temp1 is always 0.0, because eps = 0.0)

B = -2.*log10(temp1+2.51/Re*A) gives different result from B = -2.*log10(temp1+2.51*A/Re)

2) inline call optimization?

Using /Ob0 is different from /Ob2 (I think /Ob2 is default).

I found the first observation very strange.

Martyn_C_Intel · ‎12-18-2013

As a general rule, floating-point results are expected to vary very slightly between release and debug builds, due to variations in the order of operations and other reasons related to optimization. If you don't want this to happen, build the function or program with /fp:precise, that's what it is designed for. The impact on performance should be much less than disabling optimization altogether.

If /fp:precise doesn't take care of it, please let us know.

For (a lot of) general background, see the article attached at http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/