Compiler vectorization bug

Dimaleks · ‎02-21-2022

Hello!
I'm facing what I believe a vectorization bug in the intel fortran compiler, both 2021.4 and 2022.1. I'm pasting here the relevant loop:

do iflav = 1, nflav
      igases(:) = flavor(:,iflav)
      do ilay = 1, nlay
        do icol = 1, ncol
        ! itropo = 1 lower atmosphere; itropo = 2 upper atmosphere
        itropo = merge(1,2,tropo(icol,ilay))
        ! loop over implemented combinations of major species
          do itemp = 1, 2
            ! compute interpolation fractions needed for lower, then upper reference temperature level
            ! compute binary species parameter (eta) for flavor and temperature and
            !  associated interpolation index and factors
            ratio_eta_half = vmr_ref(itropo,igases(1),(jtemp(icol,ilay)+itemp-1)) / &
                             vmr_ref(itropo,igases(2),(jtemp(icol,ilay)+itemp-1))
            col_mix(itemp,icol,ilay,iflav) = col_gas(icol,ilay,igases(1)) + ratio_eta_half * col_gas(icol,ilay,igases(2))
            eta = merge(col_gas(icol,ilay,igases(1)) / col_mix(itemp,icol,ilay,iflav), 0.5_wp, &
                        col_mix(itemp,icol,ilay,iflav) > 2._wp * tiny(col_mix))
            loceta = eta * float(neta-1)
            jeta(itemp,icol,ilay,iflav) = min(int(loceta)+1, neta-1)

            ! if (jeta(itemp,icol,ilay,iflav) < 0) then
            !   print *, "Disable vectorization here"
            ! end if

            feta = mod(loceta, 1.0_wp)
            ! compute interpolation fractions needed for minor species
            ! ftemp_term = (1._wp-ftemp(icol,ilay)) for itemp = 1, ftemp(icol,ilay) for itemp=2
            ftemp_term = (real(2-itemp, wp) + real(2*itemp-3, wp) * ftemp(icol,ilay))
            fminor(1,itemp,icol,ilay,iflav) = (1._wp-feta) * ftemp_term
            fminor(2,itemp,icol,ilay,iflav) =        feta  * ftemp_term
            ! compute interpolation fractions needed for major species
            fmajor(1,1,itemp,icol,ilay,iflav) = (1._wp-fpress(icol,ilay)) * fminor(1,itemp,icol,ilay,iflav)
            fmajor(2,1,itemp,icol,ilay,iflav) = (1._wp-fpress(icol,ilay)) * fminor(2,itemp,icol,ilay,iflav)
            fmajor(1,2,itemp,icol,ilay,iflav) =        fpress(icol,ilay)  * fminor(1,itemp,icol,ilay,iflav)
            fmajor(2,2,itemp,icol,ilay,iflav) =        fpress(icol,ilay)  * fminor(2,itemp,icol,ilay,iflav)
          end do ! reference temperatures
        end do ! icol
      end do ! ilay
    end do ! iflav

If I leave the print line commented out, almost all the elements jeta(2,:,:,:) are large negative integers, which shouldn't happen. jeta(1,:,:,:) are fine. When compiling with -O0 or uncommenting the lines with print, the code works as intended and the values are good.

jimdempseyatthecove · ‎02-21-2022

Try adding !DIR$ NOVECTOR in front of DO itemp = 1, 2

and/or

!DIR$ NOFUSION

and/or

!DIR$ NOUNROLL

You may have loop order dependencies that the compiler failed to determine.

Also the large negative numbers, if all the same, change the debugger numeric view to hex

If these numbers are CCCCCCCC

Then those values are what the Debug release uses for uninitialized memory.

Have you run with runtime diagnostics for access out of bounds.

Note, this diagnostic won't work with assumed size arrays (real :: foo(*))

Jim Dempsey

Dimaleks · ‎02-21-2022

Thanks Jim, I'll try your suggestions.

In the meantime I've checked the actual values, which are -2147483647, or MIN_INT+1. Possibly due to int(loceta)+1 expression.
The code itself is compiled and checked with GCC, Cray and PGI compilers on several different machines, and it works correctly with Intel at -O0, therefore I'm pretty confident there's no OOB or other similar bugs.

I'm actually OK for now with the "if" workaround, but it would be great if the compiler is fixed in the future.

jimdempseyatthecove · ‎02-22-2022

If you are compiling using MS VS, then the "hammer" method is:

Select Release Build

Then in the solution explorer, locate the source file containing your problematic code,

right click on that file,

click on properties

then find and change the optimizations, for that file alone, that succeeds

While you could set optimizations to disable, there may be other settings that produce desired results (e.g. Floating Point Model: Precise)

If building with make files, then use a special rule for that source.

Jim Dempsey

Steve_Lionel · ‎02-21-2022

You still have no evidence of a compiler bug. Can you provide a minimal, reproducible example that shows the issue?