I think that there is an

Nguyen__Tyler · ‎11-16-2017

!bug.f90
program NN
    implicit none

    integer, parameter     :: input_size  = 2
    integer, parameter     :: hidden_size = 6
    integer, parameter     :: num_layer   = 9

    real*8                 :: LN  (hidden_size, hidden_size, num_layer)
    real*8                 :: LN0 (hidden_size, input_size)
    real*8                 :: LNL (1, hidden_size)
    real*8                 :: BL  (1,1)
    real*8                 :: RL  (1,1)
    real*8                 :: B   (hidden_size, 1, 0:num_layer)

print *, BB([-2.2d0, 0.4d0])

contains

    function BB(x)
        real*8, intent(in)  :: x(input_size)
        real*8              :: BB
        integer             :: i
        real*8              :: inp (hidden_size, 1)
        real*8   A           :: tmp (hidden_size, 1)

        LN  = 1
        B   = 1
        LN0 = 1
        BL  = 1
        LNL = 1
        tmp = matmul(LN0, reshape(x, [input_size, 1])) + B(:,:, 0)
        inp = sin(tmp)

        do i = 1, num_layer
            tmp             = matmul(LN(:, :, i), inp) + B(:,:,i)
            inp             = tanh(tmp)
        end do

        RL = tanh( matmul(LNL, inp) + BL)
        BB = RL(1,1)
    end function

end program NN

With and without -fast, the compiled programs print out two different results

 ifort-18.0.1.126 bug.f90 -o bug ; ./bug
 -0.999909105178721

and

ifort-18.0.1.126 bug.f90 -fast -o bug ; ./bug
  0.761594155955765

Interestingly, when num_layer < 9, there is no such difference. Moreover, when I disable unroll feature, the result is corrected.

ifort-18.0.1.126 bug.f90 -fast -o bug -unroll0; ./bug
 -0.999909105178721

My CPU is 3.69 GHz Quad-Core Intel Xeon E5.

Steve_Lionel · ‎11-17-2017

-fast implies -xHost, -O3 -Qipo which can all create a different instruction sequence.

See my 2013 presentation on the topic of numerical reproducibility.

mecej4 · ‎11-24-2017

I think that there is an optimization bug here, as we can see by adding

		write (*, '(1x,6ES12.4)' ) inp

after Line-41. The output with -fast alone:

   0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00
  0.761594155955765

whereas, with -fast -unroll0 we get:

  -9.9991E-01 -9.9991E-01 -9.9991E-01 -9.9991E-01 -9.9991E-01 -9.9991E-01
 -0.999909105178721

Note that 0.76159.. is simply the value of tanh(1).

Inconsistent results with -fast flag