Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!

Inconsistent results with -fast flag

Beginner
96 Views
```!bug.f90
program NN
implicit none

integer, parameter     :: input_size  = 2
integer, parameter     :: hidden_size = 6
integer, parameter     :: num_layer   = 9

real*8                 :: LN  (hidden_size, hidden_size, num_layer)
real*8                 :: LN0 (hidden_size, input_size)
real*8                 :: LNL (1, hidden_size)
real*8                 :: BL  (1,1)
real*8                 :: RL  (1,1)
real*8                 :: B   (hidden_size, 1, 0:num_layer)

print *, BB([-2.2d0, 0.4d0])

contains

function BB(x)
real*8, intent(in)  :: x(input_size)
real*8              :: BB
integer             :: i
real*8              :: inp (hidden_size, 1)
real*8   A           :: tmp (hidden_size, 1)

LN  = 1
B   = 1
LN0 = 1
BL  = 1
LNL = 1
tmp = matmul(LN0, reshape(x, [input_size, 1])) + B(:,:, 0)
inp = sin(tmp)

do i = 1, num_layer
tmp             = matmul(LN(:, :, i), inp) + B(:,:,i)
inp             = tanh(tmp)
end do

RL = tanh( matmul(LNL, inp) + BL)
BB = RL(1,1)
end function

end program NN
```

With and without -fast, the compiled programs print out two different results

``` ifort-18.0.1.126 bug.f90 -o bug ; ./bug
-0.999909105178721```

and

```ifort-18.0.1.126 bug.f90 -fast -o bug ; ./bug
0.761594155955765```

Interestingly, when  num_layer < 9, there is no such difference. Moreover, when I disable unroll feature, the result is corrected.

```ifort-18.0.1.126 bug.f90 -fast -o bug -unroll0; ./bug
-0.999909105178721```

My CPU is 3.69 GHz Quad-Core Intel Xeon E5.

2 Replies
Black Belt Retired Employee
96 Views

-fast implies -xHost, -O3 -Qipo which can all create a different instruction sequence.

Black Belt
96 Views

I think that there is an optimization bug here, as we can see by adding

`		write (*, '(1x,6ES12.4)' ) inp`

after Line-41. The output with -fast alone:

```   0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00
0.761594155955765```

whereas, with -fast -unroll0 we get:

```  -9.9991E-01 -9.9991E-01 -9.9991E-01 -9.9991E-01 -9.9991E-01 -9.9991E-01
-0.999909105178721```

Note that 0.76159.. is simply the value of tanh(1).