Inconsistent results with -fast flag

```!bug.f90
program NN
implicit none

integer, parameter     :: input_size  = 2
integer, parameter     :: hidden_size = 6
integer, parameter     :: num_layer   = 9

real*8                 :: LN  (hidden_size, hidden_size, num_layer)
real*8                 :: LN0 (hidden_size, input_size)
real*8                 :: LNL (1, hidden_size)
real*8                 :: BL  (1,1)
real*8                 :: RL  (1,1)
real*8                 :: B   (hidden_size, 1, 0:num_layer)

print *, BB([-2.2d0, 0.4d0])

contains

function BB(x)
real*8, intent(in)  :: x(input_size)
real*8              :: BB
integer             :: i
real*8              :: inp (hidden_size, 1)
real*8   A           :: tmp (hidden_size, 1)

LN  = 1
B   = 1
LN0 = 1
BL  = 1
LNL = 1
tmp = matmul(LN0, reshape(x, [input_size, 1])) + B(:,:, 0)
inp = sin(tmp)

do i = 1, num_layer
tmp             = matmul(LN(:, :, i), inp) + B(:,:,i)
inp             = tanh(tmp)
end do

RL = tanh( matmul(LNL, inp) + BL)
BB = RL(1,1)
end function

end program NN
```

With and without -fast, the compiled programs print out two different results

``` ifort-18.0.1.126 bug.f90 -o bug ; ./bug
-0.999909105178721```

and

```ifort-18.0.1.126 bug.f90 -fast -o bug ; ./bug
0.761594155955765```

Interestingly, when  num_layer < 9, there is no such difference. Moreover, when I disable unroll feature, the result is corrected.

```ifort-18.0.1.126 bug.f90 -fast -o bug -unroll0; ./bug
-0.999909105178721```

My CPU is 3.69 GHz Quad-Core Intel Xeon E5.

-fast implies -xHost, -O3 -Qipo which can all create a different instruction sequence.

I think that there is an optimization bug here, as we can see by adding

`		write (*, '(1x,6ES12.4)' ) inp`

after Line-41. The output with -fast alone:

```   0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00
0.761594155955765```

whereas, with -fast -unroll0 we get:

```  -9.9991E-01 -9.9991E-01 -9.9991E-01 -9.9991E-01 -9.9991E-01 -9.9991E-01
-0.999909105178721```

Note that 0.76159.. is simply the value of tanh(1).