Optimizer bug -O2 in ifort 16?

Christoph_F_ · ‎02-09-2018

Dear all.

I have a problem with the Intel Fortran compiler (version 16.0.2 20160204) combined with IntelMPI (version 5.1.3 Build 20160120). The error does not appear without MPI.

The code that is attached below runs correctly when compiled with -O0, -O1, and -O3 but not with -O2. With "-fno-inline-functions", the problem does not occur even with -O2. Also, the compiler versions 18.0.0 20170811 (with IntelMPI 2018 Build 20170713) and 12.1.3 20120212 (with MPICH 3.2) do not show this problem. Unfortunately, I was not able to create a testcase, because a small change of the code (e.g., a write statement or the removal of an if statement, even with a false condition) often makes the error disappear.

I should also mention that the code runs correctly in most cases, and the error appears only in very few cases. But if it does, the error is reproducible.

As you can see in the attached (simplified) code snippet, the subroutine "qp_adiabatic" modifies the array "hqp". It should increase the diagonal elements by 1 and set the offdiagonal elements to zero. In the problematic case, the array "hqp" is nonzero in all elements (including the diagonal ones). After the modification, the diagonal elements are all 1 except for the first, see output file "fort.123" below.

Of course, I could simply use a different compiler, but there is always the nagging suspicion that the error might be caused by wrong coding (perhaps by an error elsewhere in the code). So, my questions are the following:
- Is there an obvious breach of the Fortran standard? (Obviously, I modify an array that is not passed to the subroutine explicitly. In fact, the error goes away if I declare it with INTENT(INOUT). However, the subroutine is contained in a parent routine, and all variables and arrays of the parent routine should be accessible in my understanding.)
- Is this a known problem of the present compiler and MPI versions?
- Would it be possible that this error is caused by wrong coding elsewhere in the code? (Since it seems to be quite clear-cut: the array is obviously modified incorrectly.)
- Is there something in the code that would suggest that the -O2 optimization might create a wrong executable (by optimizing something away or so)?
- Is there a possibility to analyze this problem in a more detailed way to see what the computer does with the array exactly?

This issue might be related to:
https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/606074
https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/685354

The code has already been simplified a bit:

     integer :: n
     complex(8), allocatable :: hqp(:,:)
     minstep=10
     n=60
     allocate(hqp(n,n))
     call qp_adiabatic(21,0)
     ...

contains

     subroutine qp_adiabatic(i,istep)
     implicit none
     integer, intent(in) :: i,istep
     integer             :: j,k
     if(istep.lt.minstep) then
       if(istep.eq.0.and.i.eq.21) then
         write(123,*) (hqp(j,j),j=1,n)
       endif
       if(istep.lt.0) call fatal('qp_scale: istep<0. (bug?)')
       do k = 1,n ; do j = 1,n
         if(j.eq.k) then
           hqp(j,j) = hqp(j,j) + 1d0
         else
           hqp(j,k) = 0
         endif
       enddo ; enddo
       if(istep.eq.0.and.i.eq.21) then
         write(123,*)
         write(123,*) (hqp(j,j),j=1,n)
       endif
     endif
     end subroutine qp_adiabatic

The file fort.123 looks like this if the error occurs:

(-0.156621693392732,1.681941124276988E-003)
(-0.156621693392732,1.681941124276986E-003)
(-0.156040111750621,1.717955673865940E-003)
... nonzero complex values ...
(0.253993187912219,6.920763917916860E-004)
(0.262222706526087,7.572720081067049E-004)
(0.262222706526087,7.572720081067215E-004)

(0.843378306607268,1.681941124276988E-003)
(1.00000000000000,0.000000000000000E+000)
(1.00000000000000,0.000000000000000E+000)
(1.00000000000000,0.000000000000000E+000)
... all values are 1 ...
(1.00000000000000,0.000000000000000E+000)
(1.00000000000000,0.000000000000000E+000)
(1.00000000000000,0.000000000000000E+000)

Thank you.

Best regards
Christoph

mecej4 · ‎02-11-2018

At least for the code that you showed, here is the explanation:

When the statement hqp(j,j) = hqp(j,j) + 1d0 is executed, the value of hqp(j,j) on the right hand side is undefined, because you just allocated hqp and immediately after that called the subroutine.

The WRITE statement at the beginning of the subroutine will just print garbage for the same reason.

Christoph_F_ · ‎02-12-2018

Ok, good point. Sorry, I have not made it clear enough. The code outside the subroutine should be understood as pseudo code. In the real code, the array hqp is actually calculated before qp_adiabatic is called. I should have written:

     allocate(hqp(n,n))
     ... array hqp is calculated ...
     call qp_adiabatic(21,0)

So, the array is completely defined, and the diagonal elements are written to fort.123. Then, the computer should increase them by 1, but the computer writes 1d0 into the array elements, instead.

By the way, as written in my first posting, the computer does it correctly in most cases. The error appears only occasionally.

In any case, thank you for your reply.