Solved: Optimization bug in IFX 2024.1.0

mecej4 · ‎04-08-2024

Test 52 of the venerable SLATEC library (www.netlib.org/slatec) fits a piecewise polynomial to data. The Fortran 77 sources for this test are contained in about 40 files, with about 20,000 lines of code. When one of the source files (BNDACC.f) is compiled with the current version of IFX (2024.1.0) with options /Qmkl /MD /standard-semantics /Ot, the test program fails.

The following reproducer is a drastically shortened derivative from those 40 files, and is less than 200 lines long. It can be built using IFORT, IFX, Gfortran, etc., and uses some BLAS routines.

program tbndacc
   implicit none
   integer, parameter :: Mdg = 12, Nb = 4, Mt = 5, Jt =3
   real :: G(Mdg,6) = 0.0
   integer :: Ip = 2, Ir = 4

   G(1,1:5) = (/   0.2,   21.0,   40.8,    4.7,    1.7 /)
   G(2,1:5) = (/  -3.1,  -41.2,  -30.9,   -0.9,   -4.6 /)
   G(3,1:5) = (/   0.5,   19.3,   33.3,    2.9,    8.0 /)
   G(4,1:5) = (/  11.1,   44.4,   11.1,    0.01,   18.0 /)
   G(5,1:5) = (/   6.8,   43.1,   16.7,    0.05,   24.7 /)

   call BndAcc(G,Mdg,Nb,Ip,Ir,Mt,Jt)
CONTAINS
!
   subroutine bndacc(G,Mdg,Nb,Ip,Ir,Mt,Jt)
!
      implicit none
      integer Ip, Ir, Jt, Mdg, Mt, Nb
      real G(Mdg,*), rho, zero
      integer i, ie, ig, k, kh, l, lp1, mh, mu, nbp1
!
      zero = 0.
      nbp1 = Nb + 1
      if ( Jt /= Ip ) then
         if ( Jt > Ir ) then
            do i = 1, Mt
               G(Jt + Mt - i,1:nbp1) = G(Ir + Mt - i,1:nbp1)
            end do
            ie = Jt - Ir
            do i = 1, ie
               G(Ir + i - 1,1:nbp1) = zero
            end do
            Ir = Jt
         end if
         mu = min(Nb-1,Ir-Ip-1)
         if ( mu /= 0 ) then
            do l = 1, mu
               k = min(L,Jt-Ip)
               lp1 = l + 1
               ig = Ip + l
               do i = lp1, Nb
                  G(ig,i - k) = G(ig,i)
               end do
               do i = 1, k
                  G(ig,nbp1 - i) = zero
               end do
            end do
         end if
         Ip = Jt
      end if
      mh = Ir + Mt - Ip
      kh = min(nbp1,mh)
      do i = 1, kh
         call h12(1,i,max(i+1,Ir-Ip+1),mh,G(Ip,i),1,rho,G(Ip,i+1),1,Mdg,nbp1-i)
      end do
      Ir = Ip + kh
      if ( kh >= nbp1 ) G(Ir-1,i:Nb) = zero
      print '(i3,4x,5es12.4)',(i,G(i,1:Nb+1), i = 3, 5)
      return
   end subroutine bndacc

   subroutine h12(Mode,Lpivot,L1,M,U,Iue,Up,C,Ice,Icv,Ncv)
      implicit none
!
      integer Ice, Icv, Iue, L1, Lpivot, M, Mode, Ncv
      real Up
      real C(*), U(Iue,*)
!
      real b, cl, clinv, one, sm, ul1m1
      integer i, i2, i3, i4, incr, j, kl1, kl2, klp, l1m1, mml1p2
      real, external :: sdot
      external saxpy, sswap
      one = 1.
!
      if ( 0 >= Lpivot .or. Lpivot >= L1 .or. L1 > M ) return
      cl = abs(U(1,Lpivot))
      do j = L1, M
         cl = max(abs(U(1,j)),cl)
      end do
      if ( cl <= 0 ) return
      clinv = one/cl
      sm = (U(1,Lpivot)*clinv)**2
      do j = L1, M
         sm = sm + (U(1,j)*clinv)**2
      end do
      cl = cl*sqrt(sm)
      if ( U(1,Lpivot) > 0 ) cl = -cl
      Up = U(1,Lpivot) - cl
      U(1,Lpivot) = cl
      if ( Ncv <= 0 ) return
      b = Up*U(1,Lpivot)
      if ( b < 0 ) then
         b = one/b
         mml1p2 = M - L1 + 2
         if ( mml1p2 > 20 ) then
            l1m1 = L1 - 1
            kl1 = 1 + (l1m1-1)*Ice
            kl2 = kl1
            klp = 1 + (Lpivot-1)*Ice
            ul1m1 = U(1,l1m1)
            U(1,l1m1) = Up
            if ( Lpivot /= l1m1 ) call sswap(Ncv,C(kl1),Icv,C(klp),Icv)
            do j = 1, Ncv
               sm = sdot(mml1p2,U(1,l1m1),Iue,C(kl1),Ice) + b
               call saxpy(mml1p2,sm,U(1,l1m1),Iue,C(kl1),Ice)
               kl1 = kl1 + Icv
            end do
            U(1,l1m1) = ul1m1
            if ( Lpivot == l1m1 ) return
            kl1 = kl2
            call sswap(Ncv,C(kl1),Icv,C(klp),Icv)
         else
            i2 = 1 - Icv + Ice*(Lpivot-1)
            incr = Ice*(L1-Lpivot)
            do j = 1, Ncv
               i2 = i2 + Icv
               i3 = i2 + incr
               i4 = i3
               sm = C(i2)*Up
               do i = L1, M
                  sm = sm + C(i3)*U(1,i)
                  i3 = i3 + Ice
               end do
               if ( sm /= 0 ) then
                  sm = sm*b
                  C(i2) = C(i2) + sm*Up
                  do i = L1, M
                     C(i4) = C(i4) + sm*U(1,i)
                     i4 = i4 + Ice
                  end do
               end if
            end do
         end if
      end if
      return
   end subroutine h12
end program

The incorrect results, from IFX with some level of optimization enabled:

ifx /Qmkl /standard-semantics /nologo /Ot tbndacc.f90 & tbndacc
  3     -1.3336E+01 -5.9561E+01 -1.8384E+01 -3.3817E-02 -2.9315E+01
  4      1.1100E+01 -1.7026E+01 -7.4028E+00 -3.4349E-02 -8.2774E+00
  5      6.8000E+00  1.6941E+01 -4.2105E+00 -1.6629E-02 -8.3781E+00

The correct results from IFX (no optimization), Gfortran, etc.:

ifx /Qmkl /standard-semantics /Od /nologo tbndacc.f90 & tbndacc
  3     -2.3280E+01 -6.1367E+01 -1.2575E+01 -1.9373E-02 -2.2430E+01
  4      1.1100E+01 -3.4233E+01 -1.5701E+01 -4.1192E-02 -2.2017E+01
  5      6.8000E+00  2.7982E+01 -2.4215E+00 -2.2976E-02 -3.2010E+00

Ron_Green · ‎04-08-2024

I reduced this down considerably. And found the DO loop that was unrolled improperly and causes the error. Again, turning off unrolling OR using the DIR$ NOUNROLL directive on this 1 loop will work around it while I get a bug report going and get us a fix.

I also got MKL out of the free variables, so compilation is quite simple

program tbndacc
   implicit none
   integer, parameter ::  Nb = 4, Jt =3
   real :: G(12,6) = 0.0
   integer :: Ip = 2, Ir = 4

   G(1,1:5) = (/   0.2,   21.0,   40.8,    4.7,    1.7 /)
   G(2,1:5) = (/  -3.1,  -41.2,  -30.9,   -0.9,   -4.6 /)
   G(3,1:5) = (/   0.5,   19.3,   33.3,    2.9,    8.0 /)
   G(4,1:5) = (/  11.1,   44.4,   11.1,    0.01,   18.0 /)
   G(5,1:5) = (/   6.8,   43.1,   16.7,    0.05,   24.7 /)

   call BndAcc(G,Nb,Ip,Ir,Jt)
CONTAINS
!
   subroutine bndacc(G,Nb,Ip,Ir,Jt)
!
      implicit none
      integer Ip, Ir, Jt, Nb
      real :: G(12,*), zero
      integer i,ig, k, l, lp1, mu, nbp1
!
      integer :: temp_max

      zero = 0.
      nbp1 = Nb + 1
         mu = min(Nb-1,Ir-Ip-1) !..mu will equal 1

            do l = 1, mu
               k = min(L,Jt-Ip)
               lp1 = l + 1
               ig = Ip + l

! dir NOUNROLL will prevent unroll error dir$ NOUNROLL
               do i = lp1, Nb
                  G(ig,i - k) = G(ig,i)
               end do

               do i = 1, k
                  G(ig,nbp1 - i) = zero
               end do
            end do
      print*, 'G(3,1:5)'
      print*, G(3,1:5)
      print*, "Should be"
      print*, "  19.30000       33.30000       2.900000      0.0000000E+00   8.000000"
      STOP

      return
   end subroutine bndacc

end program

and here is the build/run at O2 to show the error

 rm a.out ; ifx  -O2  repro.f90 ; ./a.out
 G(3,1:5)
   2.900000       2.900000       2.900000      0.0000000E+00   8.000000    
 Should be
   19.30000       33.30000       2.900000      0.0000000E+00   8.000000

and no error at O1

rm a.out ; ifx  -O1  repro.f90 ; ./a.out
 G(3,1:5)
   19.30000       33.30000       2.900000      0.0000000E+00   8.000000    
 Should be
   19.30000       33.30000       2.900000      0.0000000E+00   8.000000

and no error at O2 if -unroll0

rm a.out ; ifx  -O2 -unroll0 repro.f90 ; ./a.out
 G(3,1:5)
   19.30000       33.30000       2.900000      0.0000000E+00   8.000000    
 Should be
   19.30000       33.30000       2.900000      0.0000000E+00   8.000000

and no error if I use DIR$ NOUNROLL on that loop

!dir$ NOUNROLL
               do i = lp1, Nb
                  G(ig,i - k) = G(ig,i)
               end do

rm a.out ; ifx  -O2 repro.f90 ; ./a.out
 G(3,1:5)
   19.30000       33.30000       2.900000      0.0000000E+00   8.000000    
 Should be
   19.30000       33.30000       2.900000      0.0000000E+00   8.000000

View solution in original post

mecej4 · ‎04-08-2024

If the argument lists are removed from the CALL to and the first line of subroutine BNDACC, and the corresponding local variable declarations in BNDACC are removed, the bug is no longer seen. All the current arguments are then available through host association.

Ron_Green · ‎04-08-2024

using llvm opt-bisect-limit I found the error is tied to loop unrolling optimization. Now it could be that @mecej4 has found an error in our references to the arguments. Try this,

/Qunroll:0

On my linux system when I disabled loop unrolling to 2 or less then I didn't see the error.

Ron_Green · ‎04-08-2024

I reduced this down considerably. And found the DO loop that was unrolled improperly and causes the error. Again, turning off unrolling OR using the DIR$ NOUNROLL directive on this 1 loop will work around it while I get a bug report going and get us a fix.

I also got MKL out of the free variables, so compilation is quite simple

program tbndacc
   implicit none
   integer, parameter ::  Nb = 4, Jt =3
   real :: G(12,6) = 0.0
   integer :: Ip = 2, Ir = 4

   G(1,1:5) = (/   0.2,   21.0,   40.8,    4.7,    1.7 /)
   G(2,1:5) = (/  -3.1,  -41.2,  -30.9,   -0.9,   -4.6 /)
   G(3,1:5) = (/   0.5,   19.3,   33.3,    2.9,    8.0 /)
   G(4,1:5) = (/  11.1,   44.4,   11.1,    0.01,   18.0 /)
   G(5,1:5) = (/   6.8,   43.1,   16.7,    0.05,   24.7 /)

   call BndAcc(G,Nb,Ip,Ir,Jt)
CONTAINS
!
   subroutine bndacc(G,Nb,Ip,Ir,Jt)
!
      implicit none
      integer Ip, Ir, Jt, Nb
      real :: G(12,*), zero
      integer i,ig, k, l, lp1, mu, nbp1
!
      integer :: temp_max

      zero = 0.
      nbp1 = Nb + 1
         mu = min(Nb-1,Ir-Ip-1) !..mu will equal 1

            do l = 1, mu
               k = min(L,Jt-Ip)
               lp1 = l + 1
               ig = Ip + l

! dir NOUNROLL will prevent unroll error dir$ NOUNROLL
               do i = lp1, Nb
                  G(ig,i - k) = G(ig,i)
               end do

               do i = 1, k
                  G(ig,nbp1 - i) = zero
               end do
            end do
      print*, 'G(3,1:5)'
      print*, G(3,1:5)
      print*, "Should be"
      print*, "  19.30000       33.30000       2.900000      0.0000000E+00   8.000000"
      STOP

      return
   end subroutine bndacc

end program

and here is the build/run at O2 to show the error

 rm a.out ; ifx  -O2  repro.f90 ; ./a.out
 G(3,1:5)
   2.900000       2.900000       2.900000      0.0000000E+00   8.000000    
 Should be
   19.30000       33.30000       2.900000      0.0000000E+00   8.000000

and no error at O1

rm a.out ; ifx  -O1  repro.f90 ; ./a.out
 G(3,1:5)
   19.30000       33.30000       2.900000      0.0000000E+00   8.000000    
 Should be
   19.30000       33.30000       2.900000      0.0000000E+00   8.000000

and no error at O2 if -unroll0

rm a.out ; ifx  -O2 -unroll0 repro.f90 ; ./a.out
 G(3,1:5)
   19.30000       33.30000       2.900000      0.0000000E+00   8.000000    
 Should be
   19.30000       33.30000       2.900000      0.0000000E+00   8.000000

and no error if I use DIR$ NOUNROLL on that loop

!dir$ NOUNROLL
               do i = lp1, Nb
                  G(ig,i - k) = G(ig,i)
               end do

rm a.out ; ifx  -O2 repro.f90 ; ./a.out
 G(3,1:5)
   19.30000       33.30000       2.900000      0.0000000E+00   8.000000    
 Should be
   19.30000       33.30000       2.900000      0.0000000E+00   8.000000

Ron_Green · ‎04-08-2024

Bug ID is CMPLRLLVM-57523. Thank you for isolating this to a short reproducer. I helped me to quickly get to the root cause.

mecej4 · ‎04-09-2024

Test 52 of SLATEC runs and reports "passed" if bndacc.f is compiled with IFX /Qunroll:2 . Similarly, Test 53 (the double precision version of Test52) runs correctly if dbndac.f is compiled with /Qunroll:2.

Thanks for preparing and filing the IFX compiler error report.

Devorah_H_Intel · ‎08-27-2024

This issue has been fixed in the recently released Intel Fortran Compiler, available for download as part of Intel HPC Toolkit 2024.2.1