Re: Wrong results with ifx 2024.0.1 - Page 2

Jellby1 · ‎12-15-2023

This is adapted from OpenMolcas and causes a failure. There are other failures.

program test                                                                                                                                                              [40/42934]
                                             
call Get_Subblock()                 
                                             
end program test                                                                                                                                                                    
                                             
!=======================================================================
                                             
module nq_Info             
                                             
use, intrinsic :: iso_fortran_env, only: int64

integer(kind=int64) :: nx = 10, ny = 4, nz = 10

end module nq_Info

!=======================================================================

subroutine Get_Subblock()

use, intrinsic :: iso_fortran_env, only: int64, real64
use nq_Info, only: nx, ny, nz

implicit none
integer(kind=int64), parameter :: nNQ = 1
integer(kind=int64) :: ix, ixyz, iy, iyz, iz
real(kind=real64) :: R2, t1, t2, t3, x_max_, x_min, x_min_, x_NQ, y_max_, y_min, y_min_, y_NQ, z_max_, z_min, z_min_, z_NQ
logical :: done = .false.

if (done) return
done = .true.

ixyz = 6
x_min = -8.0_real64
y_min = -2.0_real64                 
z_min = -7.826364311586_real64               
                                                                                                                                                                                    
iyz = 1+(ixyz-1)/nx                          
ix = ixyz-(iyz-1)*nx                                                                      
iz = 1+(iyz-1)/ny                            
iy = iyz-(iz-1)*ny         
                                             
x_min_ = x_min+(ix-2)*2.0_real64                                                          
x_max_ = x_min_+2.0_real64
y_min_ = y_min+(iy-2)*2.0_real64                                                          
y_max_ = y_min_+2.0_real64
z_min_ = z_min+(iz-2)*2.0_real64
z_max_ = z_min_+2.0_real64
if (ix == 1) x_min_ = -1.0e99_real64                                                      
if (ix == nx) x_max_ = 1.0e99_real64
if (iy == 1) y_min_ = -1.0e99_real64
if (iy == ny) y_max_ = 1.0e99_real64
if (iz == 1) z_min_ = -1.0e99_real64                                                      
if (iz == nz) z_max_ = 1.0e99_real64

x_NQ = 1.40421662117370_real64
y_NQ = 0.000000000000000_real64          
z_NQ = 1.02874664437177_real64              
                                                                                                                                                                                    
t1 = (x_NQ-x_min_)/(x_max_-x_min_)
if (t1 < 0.0_real64) t1 = 0.0_real64
if (t1 > 1.0_real64) t1 = 1.0_real64
t2 = (y_NQ-y_min_)/(y_max_-y_min_)
if (t2 < 0.0_real64) t2 = 0.0_real64
if (t2 > 1.0_real64) t2 = 1.0_real64
t3 = (z_NQ-z_min_)/(z_max_-z_min_)
if (t3 < 0.0_real64) t3 = 0.0_real64                                                                                                                                                
if (t3 > 1.0_real64) t3 = 1.0_real64
R2 = (x_NQ-(x_max_-x_min_)*t1-x_min_)**2+(y_NQ-(y_max_-y_min_)*t2-y_min_)**2+(z_NQ-(z_max_-z_min_)*t3-z_min_)**2
                                             
write(6,*) R2                     
                                             
end subroutine Get_Subblock

Compiling with -O0 and -O2 gives wildly different results:

$ ifx -O0 a.f90  && ./a.out
  0.000000000000000E+000
$ ifx -O2 a.f90  && ./a.out
   1.05831965830618
$ ifx --version
ifx (IFX) 2024.0.1 20231122
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.

andrew_4619 · ‎12-21-2023

I am inclined to agree with Jim.....

Steve_Lionel · ‎12-21-2023

Jim, I am with you on this. It was an argument that the Fortran compiler team had with the optimizer team and lost. Perhaps with ifx and LLVM there's an opportunity to do it right. Or, and I like this better, have the compiler default to standard-semantics. I got tired of explaining to users why ifort didn't follow the standard in a variety of cases, though some of this shifted over the years (such as realloc_lhs.)

mecej4 · ‎01-10-2024

I think that the focus on compiler options, bugs in code generation, etc., is off the mark. I have pared down the posted code to the following test program:

program jellby0
implicit none
double precision znq,zmax,zmin,t3,vz
znq  =  1.0D0
zmax =  znq+1.0D0
zmin = -1.00000000000000D99
t3   =  1.00000000000000D0
vz   =  zNQ-(zmax-zmin)*t3-zmin
print '(A,4x,2es20.12)','vz = ',vz, (zNQ-zmax)
! both should be -1 (if calculated with infinite precision arithmetic)
end program

The output of the program, one may expect, to be -1.0, -1.0, because, when t3 = 1, the expression for vz may be simplified by inspection to zNQ - zmax. However, since zmin has a huge magnitude, numbers of widely different magnitudes are involved and subtraction (if both are of the same sign) or addition (if the two have different signs) will introduce errors.

You are welcome to try out different compiler options related to floating-point operations.

In your large application (OpenMolcas) where this issue arises, the use of 1D99 or -1D99 as sentinel values should be critically and carefully reviewed, especially in places where addition/subtraction of sentinel values may be performed.

mecej4 · ‎01-10-2024

I have to eat some of my words. It turns out that there is a somewhat delicate code generation/optimization bug in IFX, and here is a short reproducer. I am running IFX 2024.0.0 Build 20231017 on Windows 11.

program sjelb
implicit none
integer, parameter :: nMult = 8
real :: B

call InitB2(nMult, B)

CONTAINS
   subroutine InitB2(nMult, B)

   implicit none
   integer, parameter :: natom = 2, nAtQM = 2
   integer, intent(in) :: nMult
   integer IsMM(natom), iQM, jAt, nOrd, it, k
   real :: B, Ext(4,natom), TTT(8),trm, vsav(4)

   IsMM(:) = 0

   TTT(:) = (/ 0.1,  -0.7, -0.3,  -0.8, &
              -0.1,  -0.2, -0.5,  -0.3 /)

   Ext(1,:) = (/ 0.5, -0.1/)
   Ext(2,:) = (/ 0.2,  0.4/)
   Ext(3,:) = (/ 0.6,  0.1/)
   Ext(4,:) = (/-0.1,  0.1/)

   nOrd = nMult/nAtQM              ! nOrd = 8/2 = 4

   iQM = 0
   B = 0.0; k = 0
   do jAt=1,natom
     if (IsMM(jAt) == 1) cycle     ! ISMM is 0 and never changed
     iQM = iQM+1
     it = nOrd*(iQM-1)+1
     trm = Ext(1,jAt)*TTT(it)
     B = B+trm
     k = k+1
     vsav(k) = B
     if(nOrd <= 1)cycle            ! nOrd is always 4
     trm = Ext(2,jAt)*TTT(it+1)+ &
           Ext(3,jAt)*TTT(it+2)+ &
           Ext(4,jAt)*TTT(it+3)
     B = B+trm
     k = k+1
     vsav(k) = B
   end do
   print '(A,4x,ES18.10)', ' B = ', B
   print '(4ES13.5)',Vsav
   end subroutine InitB2

end PROGRAM

When compiled and run with /Od, (and also with other compilers, including IFort) this program give the output

 B =      -3.3999997400E-01
  5.00000E-02 -1.90000E-01 -1.80000E-01 -3.40000E-01

and with /O2, the output is

 B =      -5.0000000000E-01
 -5.00000E-02 -2.90000E-01 -3.40000E-01 -5.00000E-01

The bug is rather fragile. For instance, the variable nOrd is always 4, but if line-39 is commented out, the bug goes away. Similarly, on line-32, ISMM is always 0, but if the line is removed or commented out the bug disappears. Furthermore, I saved the B values in a new array vsav(:) and printed the values after the DO loop because putting a simple "print *, B" inside the loop made the bug disappear.

Jellby1 · ‎01-10-2024

That was more or less what I found in the "simplified" version above:

https://community.intel.com/t5/Intel-Fortran-Compiler/Wrong-results-with-ifx-2024-0-1/m-p/1561431/highlight/true#M170327

mecej4 · ‎01-10-2024

What are your thoughts regarding the role of guard values such as +1d99 and -1d99 in evaluating expressions and the problems that they bring, as I pointed out in my first post?

Jellby1 · ‎01-10-2024

My thoughts? I agree that relying on specific floating-point values and behavior is not desirable. But the application is large and with lots of legacy and heterogeneous code, so reviewing and cleaning up all these possible cases is not realistic. As long as "-standard-semantics -fltconsistency" gives the "right" results (i.e., the same as ifort, gfortran, nvfortran, nagfor), I'm satisfied, although as others have said perhaps that should be the default.

Ron_Green · ‎01-10-2024

Thank you @Jellby1 and @mecej4 for the 2 examples.

I isolated this to a particular optimization step. BOTH examples, same opt step ( there are ~572 steps).

I am writing up a bug report and I will push this issue for a fast resolution.

Ron_Green · ‎01-10-2024

The bug ID for these 2 reproducers is CMPLRLLVM-55163

I'm going to push to get this fixed quickly. It's in the backend optimizer.

Ron_Green · ‎01-26-2024

We have a fix. It is going into the 2024.1.0 release. This release will occur roughly end of March or early April.

The 2 testcases are attached. Both return PASS with a pre-release build of the ifx 2024.1.0 compiler.

$ ifx -O2 test1.f90
$ ./a.out
 B should be 3.4000E-1 approx.
 B =      -3.4000000358E-01
 PASS

$ ifx -O2 test2.f90
$ ./a.out
 B should be 26.00, B is:    26.00000    
 PASS

mecej4 · ‎01-11-2024

So far, all the short test programs that we tried in this thread appeared to work with with the Classic compiler, and the optimizer bugs were noticed only with IFX and only when some level of optimization was requested or was supplied by default. While poking around with the reproducers, I cooked one up that raises suspicion of an optimizer bug with every one of the versions of the Ifort Classic compiler that I tried. That is worrisome, because it shows that there may be code segments that will not be compiled correctly with either IFort or IFx unless optimization is completely disabled.

It is not clear to me what the programmer expected from the parentheses in the expressions that combine very large and small values with addition/subtraction, so it is hard to select FP options when compiling that will support those expectations. May I suggest to @Jellby1 that some reference runs be made with OpenMolcas on a few test problems with (a) Ifx -O2 -fp:strict and (b) Ifx -O2 ?

program stest

   implicit none
   integer, parameter :: nx = 10, ny = 4, nz = 10
   integer :: ix, ixyz, iy, iyz, iz, ip(3), np(3)
   real :: R2
   real, dimension(3) :: t,p_min,p_min_,p_max_,p_NQ,dp

   ixyz = 6
   p_min = (/-8.0,-2.0,-7.0/)
   p_NQ  = (/ 1.4, 0.0, 1.0/)

   iyz = 1+(ixyz-1)/nx
   ix = ixyz-(iyz-1)*nx
   iz = 1+(iyz-1)/ny
   iy = iyz-(iz-1)*ny

   ip = (/ix,iy,iz/)
   np = (/nx,ny,nz/)
   p_min_ = p_min+(ip-2)*2
   p_max_ = p_min_ + 2
   where(ip == 1)p_min_  = -1.0e18
   where(ip == np)p_max_ = +1.0e18
   !print '(1x,A,4x,1p,3e12.4)','p_min_',p_min_,'p_max_',p_max_

   t = min(max((p_NQ-p_min_)/(p_max_-p_min_),0.0),1.0)
   !print '(1x,A,4x,1p,3e12.4)','t     ',t
   dp = p_NQ-((p_max_-p_min_)*t+p_min_)
   R2 = sum(dp*dp)
   print '(1x,A,2x,1p,3e11.3,2x,e11.4)','dp, R2 = ',dp,R2

end program

Here are some outputs from various versions of the Classic compiler, all from runs on Windows 10.

Gfortran 11.4    -O2      dp, R2 =     0.000E+00  0.000E+00  1.000E+00   1.0000E+00

Ifort 7.10.3052  /Od      dp, R2 =     0.000E+00  0.000E+00  1.000E+00   1.0000E+00
                 /O2      dp, R2 =     0.000E+00  1.280E+02  1.290E+02   3.3025E+04

Ifort 9.1        /Od      dp, R2 =     0.000E+00  0.000E+00  1.000E+00   1.0000E+00
                 /O2      dp, R2 =     0.000E+00  0.000E+00  0.000E+00   0.0000E+00

IFort 11.1.72    /Od      dp, R2 =     0.000E+00  0.000E+00  1.000E+00   1.0000E+00
                 /O2      dp, R2 =     0.000E+00  0.000E+00  0.000E+00   0.0000E+00

IFort 15.0.7.287 /Od      dp, R2 =     0.000E+00  0.000E+00  1.000E+00   1.0000E+00
                 /O2      dp, R2 =     1.192E-07  0.000E+00  0.000E+00   1.4211E-14
                 /fast    dp, R2 =     0.000E+00  0.000E+00  0.000E+00   0.0000E+00

IFort 16.0       /Od      dp, R2 =     0.000E+00  0.000E+00  1.000E+00   1.0000E+00
                 /O2      dp, R2 =     0.000E+00  0.000E+00  0.000E+00   0.0000E+00

IFort 19.1U2     /Od      dp, R2 =     0.000E+00  0.000E+00  1.000E+00   1.0000E+00
                 /O2      dp, R2 =     0.000E+00  0.000E+00  0.000E+00   0.0000E+00

IFort 2021.9     /Od      dp, R2 =     0.000E+00  0.000E+00  1.000E+00   1.0000E+00
                 /O2      dp, R2 =     1.192E-07  0.000E+00  0.000E+00   1.4211E-14

JohnNichols · ‎01-11-2024

IFORT /O2 - with latest

mecej4 · ‎01-13-2024

Here is a modified version of your sub_block() test program that works correctly with IFX 2024.0.2 on Windows 11, without needing to turn off optimization and without needing FP:options such as /fp:strict. It does this by avoiding the use of guard values such as 1E99_real64, which give trouble when used in expressions. Please consider whether this approach can be used in file subblock.F90 .

program stest
implicit none
integer, parameter :: dbl = kind(0.0d0)

integer :: nx = 10, ny = 4, nz = 10
integer :: ix, ixyz, iy, iyz, iz, ip(3), np(3)
real(dbl) :: R2, dp(3), t(3), p_min(3), p_max_(3), p_min_(3), p_NQ(3)
integer :: clipped(3), gclipped(3)

ixyz = 6
p_min = (/ -8.0_dbl, -2.0_dbl, -7.826364311586_dbl /)

iyz = 1+(ixyz-1)/nx
ix = ixyz-(iyz-1)*nx
iz = 1+(iyz-1)/ny
iy = iyz-(iz-1)*ny
ip = (/ ix,iy,iz /)
np = (/ nx,ny,nz /)
p_min_ = p_min+(ip-2)*2.0_dbl
p_max_ = p_min_+2.0_dbl

! Clip at borders, based on [ix, iy, iz]

where (ip == 1 )
   t = 1.0_dbl
   gclipped = -1
elsewhere (ip == np)
   t = 0.0_dbl
   gclipped = +1
elsewhere
   gclipped = 0
end where

open(10,file='pnq.dat',status='old')
rewind(10)

do
   read (10,*,end=100)p_nq

   clipped = gclipped
!more clipping, now based on p_NQ values
   t = 0.0_dbl
   where(clipped == 0)
      t = (p_NQ-p_min_)/(p_max_-p_min_)
      where (t < 0.0_dbl)
         t = 0.0_dbl
         clipped = -1
      end where
      where (t > 1.0_dbl)
         t = 1.0_dbl
         clipped = +1
      end where
   end where
   print '(1x,A,5x,3(I4:,9x))','Clipped',clipped

   where (t > 0.0_dbl)
      dp = p_NQ-(p_max_-p_min_)*t-p_min_
   elsewhere
      dp = 0.0_dbl
   end where
   print '(1x,A,2x,1p,3e13.5)','p_min_',p_min_,'p_max_',p_max_, &
      'p_NQ  ',p_NQ, 't     ',t,'dp    ',dp
   R2 = sum(dp*dp)

   print '(1x,A,2x,1p,e13.5,//)','R2 = ',R2
end do
100 close(10)

end program

The input data file contains ten points, plus the one in your test program as the eleventh, for [x_nq, y_nq, z_nq].

1.960E-06  1.274E-01  1.763E+00
3.335E+00  4.815E+00  4.191E+00
1.677E+00  4.577E+00  3.979E+00
4.163E+00  1.725E+00  4.356E+00
4.496E-01  4.441E+00  3.505E+00
3.673E+00  1.501E+00  2.486E-01
4.541E+00  4.883E-01  2.016E-01
4.251E-01  2.794E+00  4.632E+00
3.782E-01  4.559E+00  4.591E-01
3.189E+00  4.261E+00  6.054E-01
    1.404        0.0     1.0287

JohnNichols · ‎01-13-2024

I broke

dp = p_NQ-(p_max_-p_min_)*t-p_min_

into a set of individual steps, and you cannot get the correct answer with /O2 for dp(1) no matter how granular you made the equations, but you can get the correct answer for dp(3).

It is all related the guard value.

jimdempseyatthecove · ‎01-13-2024

>> dp = p_NQ-(p_max_-p_min_)*t-p_min_

Caution, you can get numerically different results between:

dp = p_NQ - ((p_max_ - p_min_ ) * t - p_min_)

and

dp = (p_NQ - (p_max_ - p_min_) * t) - p_min_

If (when) the (potential) difference is significant enough, you must choose one, and, assure the compiler obeys the ()'s.

IMHO - A pet pieve of mine.... I really wish that regardless of optimization level that parenthesis are obeyed unless told otherwise (e.g. no-protect parns). The programmer (possibly 60 years ago) may have put them there for the specific purpose of avoiding ambiguity in order of operation.

Hint, perhaps there should be an option -ambiguousOrder that could be added to override the parenthesis.

Jim Dempsey

mecej4 · ‎01-14-2024

I agree broadly with what you wrote regarding the effect of order of evaluation on the results, but in this example code, the following three versions give the same results (up to the four significant digits that are output) when I compiled using ifx /O2 /MD /assume:protect-parens :

   where (t > 0.0_dbl)
      !dp = p_NQ-(p_max_-p_min_)*t-p_min_   ! version-1
      !dp = (p_NQ-p_min_)-(p_max_-p_min_)*t ! version-2
      dp = (to-t)*(p_max_-p_min_)           ! version-3
   elsewhere
      dp = 0.0_dbl
   end where

where to is the value of t computed on line 44, i.e., before clipping to a value between 0 and 1.

In the large application (OpenMolCAS), the ranges of p_NQ, p_max_, p_min_ would need to be considered before choosing one of these alternatives. Casual inspection of the file suggests that slight variations may not matter, because when the computed R2 exceeds a set limit R2_max, the point in question is discarded.

jimdempseyatthecove · ‎01-14-2024

>> the following three versions give the same results (up to the four significant digits that are output)

and the remaining digits....?

In your test case, were you are interested in 4 or 5 significant digits out of 15 of _dbl, it would seem to not make a difference if a few bits were lost here and there (that would otherwise be maintained).

Many computational problems involve high iteration counts and/or high/large data pools. In these cases, the order of the parenthesis are the programmer's means to control the computation order such that the computation segregates (as much as possible) the small magnitude values from the large magnitude values. This is done to prevent the loss of bits when performing addition or subtraction.

Jim

JohnNichols · ‎01-15-2024

program stest

   implicit none
   integer, parameter :: nx = 10, ny = 4, nz = 10
   integer :: ix, ixyz, iy, iyz, iz, ip(3), np(3), dum
   real :: R2, dum1(3)
   real, dimension(3) :: t,p_min,p_min_,p_max_,p_NQ,dp

   ixyz = 6
   p_min = (/-8.0,-2.0,-7.0/)
   p_NQ  = (/ 1.4, 0.0, 1.0/)
   dum = (ixyz-1)
   write(*,100)"A   ", dum
100 format(A,I5)   
   dum = dum /nx   
   write(*,100)"B    ",dum
   iyz = 1 + dum
   write(*,100)"C     ",iyz
   iyz = 1+(ixyz-1)/nx
   write(*,10)iyz
10 format("iyz should be :: 1.0  and is :: ",F10.3)   
   
   ix = ixyz-(iyz-1)*nx
   iz = 1+(iyz-1)/ny
   iy = iyz-(iz-1)*ny

   ip = (/ix,iy,iz/)
   np = (/nx,ny,nz/)
   p_min_ = p_min+(ip-2)*2
   p_max_ = p_min_ + 2
   where(ip == 1)p_min_  = -1.0e18
   where(ip == np)p_max_ = +1.0e18
   print '(1x,A,4x,1p,3e12.4)','p_min_',p_min_,'p_max_',p_max_

   t = min(max((p_NQ-p_min_)/(p_max_-p_min_),0.0),1.0)
   print '(1x,A,4x,1p,3e12.4)','t     ',t
   
   write(*,*)1,dp(3),p_NQ(3), p_max_(3),p_min_(3),t(3),P_min_(3)
   dum1 = p_max_-p_min_
   write(*,*)"           ASD10  ",dum1
   dum1 = dum1*t
   write(*,102),dum1
   dum1 = dum1+p_min_
   write(*,*)"           ASD11  ",dum1
   dum1 = p_nQ - dum1
   
   write(*,103)dum1
102 format("           ASD12  ",E24.18)
103 format("           ASD14  ",E24.18)
   dp = p_NQ-((p_max_-p_min_)*t+p_min_)
   write(*,*)100000,dp(3),p_NQ(3), p_max_(3),p_min_(3),t(3),P_min_(3)
   R2 = sum(dp*dp)
   print '(1x,A,2x,1p,3e11.3,2x,e11.4)','dp, R2 = ',dp,R2
   write(*,*)"AAA",dp(3)

end program

This is a variant on the original program, but breaking the steps of the main equation into single operators.

It shows as noted in an earlier post on RNG's today that you cannot use O2 with IFX. There is no doubt.

I played with this for a few days, with all sorts of variants, but you cannot fix the base error.