Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!

Inconsistent results with -fast flag

Nguyen__Tyler
Beginner
96 Views
!bug.f90
program NN
    implicit none

    integer, parameter     :: input_size  = 2
    integer, parameter     :: hidden_size = 6
    integer, parameter     :: num_layer   = 9

    real*8                 :: LN  (hidden_size, hidden_size, num_layer)
    real*8                 :: LN0 (hidden_size, input_size)
    real*8                 :: LNL (1, hidden_size)
    real*8                 :: BL  (1,1)
    real*8                 :: RL  (1,1)
    real*8                 :: B   (hidden_size, 1, 0:num_layer)

print *, BB([-2.2d0, 0.4d0])

contains

    function BB(x)
        real*8, intent(in)  :: x(input_size)
        real*8              :: BB
        integer             :: i
        real*8              :: inp (hidden_size, 1)
        real*8   A           :: tmp (hidden_size, 1)

        LN  = 1
        B   = 1
        LN0 = 1
        BL  = 1
        LNL = 1
        tmp = matmul(LN0, reshape(x, [input_size, 1])) + B(:,:, 0)
        inp = sin(tmp)

        do i = 1, num_layer
            tmp             = matmul(LN(:, :, i), inp) + B(:,:,i)
            inp             = tanh(tmp)
        end do

        RL = tanh( matmul(LNL, inp) + BL)
        BB = RL(1,1)
    end function

end program NN

 

With and without -fast, the compiled programs print out two different results

 ifort-18.0.1.126 bug.f90 -o bug ; ./bug
 -0.999909105178721

and

ifort-18.0.1.126 bug.f90 -fast -o bug ; ./bug
  0.761594155955765

Interestingly, when  num_layer < 9, there is no such difference. Moreover, when I disable unroll feature, the result is corrected.

ifort-18.0.1.126 bug.f90 -fast -o bug -unroll0; ./bug
 -0.999909105178721

My CPU is 3.69 GHz Quad-Core Intel Xeon E5.

 

 

 

 

0 Kudos
2 Replies
Steve_Lionel
Black Belt Retired Employee
96 Views

-fast implies -xHost, -O3 -Qipo which can all create a different instruction sequence.

See my 2013 presentation on the topic of numerical reproducibility.

mecej4
Black Belt
96 Views

I think that there is an optimization bug here, as we can see by adding

		write (*, '(1x,6ES12.4)' ) inp

after Line-41. The output with -fast alone:

   0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00
  0.761594155955765

whereas, with -fast -unroll0 we get:

  -9.9991E-01 -9.9991E-01 -9.9991E-01 -9.9991E-01 -9.9991E-01 -9.9991E-01
 -0.999909105178721

Note that 0.76159.. is simply the value of tanh(1).

Reply