Different returning values from the same subroutine

Sung_H_ · ‎10-30-2013

Hello all,

I found some abnormal behaviors of my LBM program. The snippet is as follows, and i paste the code here:http://pastebin.com/mD3TFVhc

[fortran]

    do j = 1,ly
        do i = 1,lx
            call collision(i,j,omega,eul_f,p, f,     fnew)
            call collision(i,j,omega,eul_f,p, n_hlp, node)
        end do
    end do

[/fortran]

Here, four arrays 'f,fnew,n_hlp,node' have the same dimensions. 'f' and 'h_hlp' are initialized to be the same and 'fnew' and 'node' store the returning values. If compiling the code with the following command

[bash]

ifort snip.f90

./a.out

[/bash]

the data file shows difference between 'fnew' and 'n_hlp', about the order of 1e-17. But uncommenting the four lines 87-90 or adding '-g' to the compiler option will results the same 'fnew' and 'n_hlp'. Is this reasonable?

Thanks in advance.

Sung Hui

Casey · ‎10-30-2013

Try using the compile flag "-fp-model source" and see if that difference goes away. In my experience small magnitude error is due to optimizing floating point math and vecorization. Whether that is reasonable depends on your requirements.

TimP · ‎10-30-2013

As Casey pointed out, you can disable optimizations which are expected to cause variations from IEEE standard and with vector alignment. -g causes a default to -O0 which will disable those and many other optimizations.

In your code, the frequent division of vector by scalar would be optimized under the default -fp-model fast by inverting the divisor outside the loop and multiplying within the loop. With ifort, this will also prevent totally safe optimizations such as replacing /2.0 by *0.5.

Several of your expressions could be grouped to produce more consistent accuracy; normally, lack of care can be excused by the use of double precision. Activating the WRITE statements in that loop may have a similar effect to compiling at -O0 or even putting !dir$ novector ahead of the loop.

TimP · ‎10-31-2013

For example, (ef -uf) would be more accurately calculated as

efmuf = (e(k,1)- u(1)) * eul_f(i,j,1) + (e(k,2)- u(2)) * eul_f(i,j,2)

an alternative evaluation explicitly encouraged by Fortran rules, but unlikely to happen consistently in your context.

Then (efmuf + (eu/Cs2)*ef)/Cs2 also encouraged by Fortran rules would be a small further improvement in accuracy, as well as eliminating one of the divisions in a better way than the compiler is likely to try without -fp-model source.

eliminating those +1 terms, e.g. (w(k)-.5*w(k)*omega) could be another accuracy improvement, although not permitted as a standard alternate.

then (w(k)*rho + (w(k)*rho)* ((eu/Cs2) +.5*(eu/Cs2)**2)) to force the compiler to repeat the same expressions (so not so many repeated divides required) as well as improving accuracy.

These could be more important to accuracy than forcing IEEE divide (-prec-div, implied by -fp-model source).

In your case, I wonder if you are getting inconsistent partial in-lining of your collision function, possibly avoidable by -fno-inline-functions. What did -opt-report show?

Sung_H_ · ‎11-01-2013

opt-report is here: http://pastebin.com/z7WZ1pmC, i'm not quite understand it. Thanks TimP for your experienced skills, i will change my codes according to your advice.

TimP · ‎11-01-2013

I guess the report shows complete in-lining of the internal subroutine, which makes it difficult to follow, as well as opening the possibility that the 2 in-lined copies aren't identical.

Normally, in-line expansion of internal subroutines is desirable, but it seems not when investigating your questions.

With in-lining, there are no opt-report references to the original source line numbers in the internal procedures. As I mentioned, I would try -fno-inline-functions to try to avoid this with ifort (or gfortran).