 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
This is adapted from OpenMolcas and causes a failure. There are other failures.
program test [40/42934]
call Get_Subblock()
end program test
!=======================================================================
module nq_Info
use, intrinsic :: iso_fortran_env, only: int64
integer(kind=int64) :: nx = 10, ny = 4, nz = 10
end module nq_Info
!=======================================================================
subroutine Get_Subblock()
use, intrinsic :: iso_fortran_env, only: int64, real64
use nq_Info, only: nx, ny, nz
implicit none
integer(kind=int64), parameter :: nNQ = 1
integer(kind=int64) :: ix, ixyz, iy, iyz, iz
real(kind=real64) :: R2, t1, t2, t3, x_max_, x_min, x_min_, x_NQ, y_max_, y_min, y_min_, y_NQ, z_max_, z_min, z_min_, z_NQ
logical :: done = .false.
if (done) return
done = .true.
ixyz = 6
x_min = 8.0_real64
y_min = 2.0_real64
z_min = 7.826364311586_real64
iyz = 1+(ixyz1)/nx
ix = ixyz(iyz1)*nx
iz = 1+(iyz1)/ny
iy = iyz(iz1)*ny
x_min_ = x_min+(ix2)*2.0_real64
x_max_ = x_min_+2.0_real64
y_min_ = y_min+(iy2)*2.0_real64
y_max_ = y_min_+2.0_real64
z_min_ = z_min+(iz2)*2.0_real64
z_max_ = z_min_+2.0_real64
if (ix == 1) x_min_ = 1.0e99_real64
if (ix == nx) x_max_ = 1.0e99_real64
if (iy == 1) y_min_ = 1.0e99_real64
if (iy == ny) y_max_ = 1.0e99_real64
if (iz == 1) z_min_ = 1.0e99_real64
if (iz == nz) z_max_ = 1.0e99_real64
x_NQ = 1.40421662117370_real64
y_NQ = 0.000000000000000_real64
z_NQ = 1.02874664437177_real64
t1 = (x_NQx_min_)/(x_max_x_min_)
if (t1 < 0.0_real64) t1 = 0.0_real64
if (t1 > 1.0_real64) t1 = 1.0_real64
t2 = (y_NQy_min_)/(y_max_y_min_)
if (t2 < 0.0_real64) t2 = 0.0_real64
if (t2 > 1.0_real64) t2 = 1.0_real64
t3 = (z_NQz_min_)/(z_max_z_min_)
if (t3 < 0.0_real64) t3 = 0.0_real64
if (t3 > 1.0_real64) t3 = 1.0_real64
R2 = (x_NQ(x_max_x_min_)*t1x_min_)**2+(y_NQ(y_max_y_min_)*t2y_min_)**2+(z_NQ(z_max_z_min_)*t3z_min_)**2
write(6,*) R2
end subroutine Get_Subblock
Compiling with O0 and O2 gives wildly different results:
$ ifx O0 a.f90 && ./a.out
0.000000000000000E+000
$ ifx O2 a.f90 && ./a.out
1.05831965830618
$ ifx version
ifx (IFX) 2024.0.1 20231122
Copyright (C) 19852023 Intel Corporation. All rights reserved.
Link Copied
 « Previous

 1
 2
 Next »
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
I am inclined to agree with Jim.....
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
Jim, I am with you on this. It was an argument that the Fortran compiler team had with the optimizer team and lost. Perhaps with ifx and LLVM there's an opportunity to do it right. Or, and I like this better, have the compiler default to standardsemantics. I got tired of explaining to users why ifort didn't follow the standard in a variety of cases, though some of this shifted over the years (such as realloc_lhs.)
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
I think that the focus on compiler options, bugs in code generation, etc., is off the mark. I have pared down the posted code to the following test program:
program jellby0
implicit none
double precision znq,zmax,zmin,t3,vz
znq = 1.0D0
zmax = znq+1.0D0
zmin = 1.00000000000000D99
t3 = 1.00000000000000D0
vz = zNQ(zmaxzmin)*t3zmin
print '(A,4x,2es20.12)','vz = ',vz, (zNQzmax)
! both should be 1 (if calculated with infinite precision arithmetic)
end program
The output of the program, one may expect, to be 1.0, 1.0, because, when t3 = 1, the expression for vz may be simplified by inspection to zNQ  zmax. However, since zmin has a huge magnitude, numbers of widely different magnitudes are involved and subtraction (if both are of the same sign) or addition (if the two have different signs) will introduce errors.
You are welcome to try out different compiler options related to floatingpoint operations.
In your large application (OpenMolcas) where this issue arises, the use of 1D99 or 1D99 as sentinel values should be critically and carefully reviewed, especially in places where addition/subtraction of sentinel values may be performed.
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
I have to eat some of my words. It turns out that there is a somewhat delicate code generation/optimization bug in IFX, and here is a short reproducer. I am running IFX 2024.0.0 Build 20231017 on Windows 11.
program sjelb
implicit none
integer, parameter :: nMult = 8
real :: B
call InitB2(nMult, B)
CONTAINS
subroutine InitB2(nMult, B)
implicit none
integer, parameter :: natom = 2, nAtQM = 2
integer, intent(in) :: nMult
integer IsMM(natom), iQM, jAt, nOrd, it, k
real :: B, Ext(4,natom), TTT(8),trm, vsav(4)
IsMM(:) = 0
TTT(:) = (/ 0.1, 0.7, 0.3, 0.8, &
0.1, 0.2, 0.5, 0.3 /)
Ext(1,:) = (/ 0.5, 0.1/)
Ext(2,:) = (/ 0.2, 0.4/)
Ext(3,:) = (/ 0.6, 0.1/)
Ext(4,:) = (/0.1, 0.1/)
nOrd = nMult/nAtQM ! nOrd = 8/2 = 4
iQM = 0
B = 0.0; k = 0
do jAt=1,natom
if (IsMM(jAt) == 1) cycle ! ISMM is 0 and never changed
iQM = iQM+1
it = nOrd*(iQM1)+1
trm = Ext(1,jAt)*TTT(it)
B = B+trm
k = k+1
vsav(k) = B
if(nOrd <= 1)cycle ! nOrd is always 4
trm = Ext(2,jAt)*TTT(it+1)+ &
Ext(3,jAt)*TTT(it+2)+ &
Ext(4,jAt)*TTT(it+3)
B = B+trm
k = k+1
vsav(k) = B
end do
print '(A,4x,ES18.10)', ' B = ', B
print '(4ES13.5)',Vsav
end subroutine InitB2
end PROGRAM
When compiled and run with /Od, (and also with other compilers, including IFort) this program give the output
B = 3.3999997400E01
5.00000E02 1.90000E01 1.80000E01 3.40000E01
and with /O2, the output is
B = 5.0000000000E01
5.00000E02 2.90000E01 3.40000E01 5.00000E01
The bug is rather fragile. For instance, the variable nOrd is always 4, but if line39 is commented out, the bug goes away. Similarly, on line32, ISMM is always 0, but if the line is removed or commented out the bug disappears. Furthermore, I saved the B values in a new array vsav(:) and printed the values after the DO loop because putting a simple "print *, B" inside the loop made the bug disappear.
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
That was more or less what I found in the "simplified" version above:
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
What are your thoughts regarding the role of guard values such as +1d99 and 1d99 in evaluating expressions and the problems that they bring, as I pointed out in my first post?
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
My thoughts? I agree that relying on specific floatingpoint values and behavior is not desirable. But the application is large and with lots of legacy and heterogeneous code, so reviewing and cleaning up all these possible cases is not realistic. As long as "standardsemantics fltconsistency" gives the "right" results (i.e., the same as ifort, gfortran, nvfortran, nagfor), I'm satisfied, although as others have said perhaps that should be the default.
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
The bug ID for these 2 reproducers is CMPLRLLVM55163
I'm going to push to get this fixed quickly. It's in the backend optimizer.
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
We have a fix. It is going into the 2024.1.0 release. This release will occur roughly end of March or early April.
The 2 testcases are attached. Both return PASS with a prerelease build of the ifx 2024.1.0 compiler.
$ ifx O2 test1.f90
$ ./a.out
B should be 3.4000E1 approx.
B = 3.4000000358E01
PASS
$ ifx O2 test2.f90
$ ./a.out
B should be 26.00, B is: 26.00000
PASS
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
So far, all the short test programs that we tried in this thread appeared to work with with the Classic compiler, and the optimizer bugs were noticed only with IFX and only when some level of optimization was requested or was supplied by default. While poking around with the reproducers, I cooked one up that raises suspicion of an optimizer bug with every one of the versions of the Ifort Classic compiler that I tried. That is worrisome, because it shows that there may be code segments that will not be compiled correctly with either IFort or IFx unless optimization is completely disabled.
It is not clear to me what the programmer expected from the parentheses in the expressions that combine very large and small values with addition/subtraction, so it is hard to select FP options when compiling that will support those expectations. May I suggest to @Jellby1 that some reference runs be made with OpenMolcas on a few test problems with (a) Ifx O2 fp:strict and (b) Ifx O2 ?
program stest
implicit none
integer, parameter :: nx = 10, ny = 4, nz = 10
integer :: ix, ixyz, iy, iyz, iz, ip(3), np(3)
real :: R2
real, dimension(3) :: t,p_min,p_min_,p_max_,p_NQ,dp
ixyz = 6
p_min = (/8.0,2.0,7.0/)
p_NQ = (/ 1.4, 0.0, 1.0/)
iyz = 1+(ixyz1)/nx
ix = ixyz(iyz1)*nx
iz = 1+(iyz1)/ny
iy = iyz(iz1)*ny
ip = (/ix,iy,iz/)
np = (/nx,ny,nz/)
p_min_ = p_min+(ip2)*2
p_max_ = p_min_ + 2
where(ip == 1)p_min_ = 1.0e18
where(ip == np)p_max_ = +1.0e18
!print '(1x,A,4x,1p,3e12.4)','p_min_',p_min_,'p_max_',p_max_
t = min(max((p_NQp_min_)/(p_max_p_min_),0.0),1.0)
!print '(1x,A,4x,1p,3e12.4)','t ',t
dp = p_NQ((p_max_p_min_)*t+p_min_)
R2 = sum(dp*dp)
print '(1x,A,2x,1p,3e11.3,2x,e11.4)','dp, R2 = ',dp,R2
end program
Here are some outputs from various versions of the Classic compiler, all from runs on Windows 10.
Gfortran 11.4 O2 dp, R2 = 0.000E+00 0.000E+00 1.000E+00 1.0000E+00
Ifort 7.10.3052 /Od dp, R2 = 0.000E+00 0.000E+00 1.000E+00 1.0000E+00
/O2 dp, R2 = 0.000E+00 1.280E+02 1.290E+02 3.3025E+04
Ifort 9.1 /Od dp, R2 = 0.000E+00 0.000E+00 1.000E+00 1.0000E+00
/O2 dp, R2 = 0.000E+00 0.000E+00 0.000E+00 0.0000E+00
IFort 11.1.72 /Od dp, R2 = 0.000E+00 0.000E+00 1.000E+00 1.0000E+00
/O2 dp, R2 = 0.000E+00 0.000E+00 0.000E+00 0.0000E+00
IFort 15.0.7.287 /Od dp, R2 = 0.000E+00 0.000E+00 1.000E+00 1.0000E+00
/O2 dp, R2 = 1.192E07 0.000E+00 0.000E+00 1.4211E14
/fast dp, R2 = 0.000E+00 0.000E+00 0.000E+00 0.0000E+00
IFort 16.0 /Od dp, R2 = 0.000E+00 0.000E+00 1.000E+00 1.0000E+00
/O2 dp, R2 = 0.000E+00 0.000E+00 0.000E+00 0.0000E+00
IFort 19.1U2 /Od dp, R2 = 0.000E+00 0.000E+00 1.000E+00 1.0000E+00
/O2 dp, R2 = 0.000E+00 0.000E+00 0.000E+00 0.0000E+00
IFort 2021.9 /Od dp, R2 = 0.000E+00 0.000E+00 1.000E+00 1.0000E+00
/O2 dp, R2 = 1.192E07 0.000E+00 0.000E+00 1.4211E14
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
Here is a modified version of your sub_block() test program that works correctly with IFX 2024.0.2 on Windows 11, without needing to turn off optimization and without needing FP:options such as /fp:strict. It does this by avoiding the use of guard values such as 1E99_real64, which give trouble when used in expressions. Please consider whether this approach can be used in file subblock.F90 .
program stest
implicit none
integer, parameter :: dbl = kind(0.0d0)
integer :: nx = 10, ny = 4, nz = 10
integer :: ix, ixyz, iy, iyz, iz, ip(3), np(3)
real(dbl) :: R2, dp(3), t(3), p_min(3), p_max_(3), p_min_(3), p_NQ(3)
integer :: clipped(3), gclipped(3)
ixyz = 6
p_min = (/ 8.0_dbl, 2.0_dbl, 7.826364311586_dbl /)
iyz = 1+(ixyz1)/nx
ix = ixyz(iyz1)*nx
iz = 1+(iyz1)/ny
iy = iyz(iz1)*ny
ip = (/ ix,iy,iz /)
np = (/ nx,ny,nz /)
p_min_ = p_min+(ip2)*2.0_dbl
p_max_ = p_min_+2.0_dbl
! Clip at borders, based on [ix, iy, iz]
where (ip == 1 )
t = 1.0_dbl
gclipped = 1
elsewhere (ip == np)
t = 0.0_dbl
gclipped = +1
elsewhere
gclipped = 0
end where
open(10,file='pnq.dat',status='old')
rewind(10)
do
read (10,*,end=100)p_nq
clipped = gclipped
!more clipping, now based on p_NQ values
t = 0.0_dbl
where(clipped == 0)
t = (p_NQp_min_)/(p_max_p_min_)
where (t < 0.0_dbl)
t = 0.0_dbl
clipped = 1
end where
where (t > 1.0_dbl)
t = 1.0_dbl
clipped = +1
end where
end where
print '(1x,A,5x,3(I4:,9x))','Clipped',clipped
where (t > 0.0_dbl)
dp = p_NQ(p_max_p_min_)*tp_min_
elsewhere
dp = 0.0_dbl
end where
print '(1x,A,2x,1p,3e13.5)','p_min_',p_min_,'p_max_',p_max_, &
'p_NQ ',p_NQ, 't ',t,'dp ',dp
R2 = sum(dp*dp)
print '(1x,A,2x,1p,e13.5,//)','R2 = ',R2
end do
100 close(10)
end program
The input data file contains ten points, plus the one in your test program as the eleventh, for [x_nq, y_nq, z_nq].
1.960E06 1.274E01 1.763E+00
3.335E+00 4.815E+00 4.191E+00
1.677E+00 4.577E+00 3.979E+00
4.163E+00 1.725E+00 4.356E+00
4.496E01 4.441E+00 3.505E+00
3.673E+00 1.501E+00 2.486E01
4.541E+00 4.883E01 2.016E01
4.251E01 2.794E+00 4.632E+00
3.782E01 4.559E+00 4.591E01
3.189E+00 4.261E+00 6.054E01
1.404 0.0 1.0287
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
I broke
dp = p_NQ(p_max_p_min_)*tp_min_
into a set of individual steps, and you cannot get the correct answer with /O2 for dp(1) no matter how granular you made the equations, but you can get the correct answer for dp(3).
It is all related the guard value.
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
>> dp = p_NQ(p_max_p_min_)*tp_min_
Caution, you can get numerically different results between:
dp = p_NQ  ((p_max_  p_min_ ) * t  p_min_)
and
dp = (p_NQ  (p_max_  p_min_) * t)  p_min_
If (when) the (potential) difference is significant enough, you must choose one, and, assure the compiler obeys the ()'s.
IMHO  A pet pieve of mine.... I really wish that regardless of optimization level that parenthesis are obeyed unless told otherwise (e.g. noprotect parns). The programmer (possibly 60 years ago) may have put them there for the specific purpose of avoiding ambiguity in order of operation.
Hint, perhaps there should be an option ambiguousOrder that could be added to override the parenthesis.
Jim Dempsey
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
I agree broadly with what you wrote regarding the effect of order of evaluation on the results, but in this example code, the following three versions give the same results (up to the four significant digits that are output) when I compiled using ifx /O2 /MD /assume:protectparens :
where (t > 0.0_dbl)
!dp = p_NQ(p_max_p_min_)*tp_min_ ! version1
!dp = (p_NQp_min_)(p_max_p_min_)*t ! version2
dp = (tot)*(p_max_p_min_) ! version3
elsewhere
dp = 0.0_dbl
end where
where to is the value of t computed on line 44, i.e., before clipping to a value between 0 and 1.
In the large application (OpenMolCAS), the ranges of p_NQ, p_max_, p_min_ would need to be considered before choosing one of these alternatives. Casual inspection of the file suggests that slight variations may not matter, because when the computed R2 exceeds a set limit R2_max, the point in question is discarded.
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
>> the following three versions give the same results (up to the four significant digits that are output)
and the remaining digits....?
In your test case, were you are interested in 4 or 5 significant digits out of 15 of _dbl, it would seem to not make a difference if a few bits were lost here and there (that would otherwise be maintained).
Many computational problems involve high iteration counts and/or high/large data pools. In these cases, the order of the parenthesis are the programmer's means to control the computation order such that the computation segregates (as much as possible) the small magnitude values from the large magnitude values. This is done to prevent the loss of bits when performing addition or subtraction.
Jim
 Mark as New
 Bookmark
 Subscribe
 Mute
 Subscribe to RSS Feed
 Permalink
 Report Inappropriate Content
program stest
implicit none
integer, parameter :: nx = 10, ny = 4, nz = 10
integer :: ix, ixyz, iy, iyz, iz, ip(3), np(3), dum
real :: R2, dum1(3)
real, dimension(3) :: t,p_min,p_min_,p_max_,p_NQ,dp
ixyz = 6
p_min = (/8.0,2.0,7.0/)
p_NQ = (/ 1.4, 0.0, 1.0/)
dum = (ixyz1)
write(*,100)"A ", dum
100 format(A,I5)
dum = dum /nx
write(*,100)"B ",dum
iyz = 1 + dum
write(*,100)"C ",iyz
iyz = 1+(ixyz1)/nx
write(*,10)iyz
10 format("iyz should be :: 1.0 and is :: ",F10.3)
ix = ixyz(iyz1)*nx
iz = 1+(iyz1)/ny
iy = iyz(iz1)*ny
ip = (/ix,iy,iz/)
np = (/nx,ny,nz/)
p_min_ = p_min+(ip2)*2
p_max_ = p_min_ + 2
where(ip == 1)p_min_ = 1.0e18
where(ip == np)p_max_ = +1.0e18
print '(1x,A,4x,1p,3e12.4)','p_min_',p_min_,'p_max_',p_max_
t = min(max((p_NQp_min_)/(p_max_p_min_),0.0),1.0)
print '(1x,A,4x,1p,3e12.4)','t ',t
write(*,*)1,dp(3),p_NQ(3), p_max_(3),p_min_(3),t(3),P_min_(3)
dum1 = p_max_p_min_
write(*,*)" ASD10 ",dum1
dum1 = dum1*t
write(*,102),dum1
dum1 = dum1+p_min_
write(*,*)" ASD11 ",dum1
dum1 = p_nQ  dum1
write(*,103)dum1
102 format(" ASD12 ",E24.18)
103 format(" ASD14 ",E24.18)
dp = p_NQ((p_max_p_min_)*t+p_min_)
write(*,*)100000,dp(3),p_NQ(3), p_max_(3),p_min_(3),t(3),P_min_(3)
R2 = sum(dp*dp)
print '(1x,A,2x,1p,3e11.3,2x,e11.4)','dp, R2 = ',dp,R2
write(*,*)"AAA",dp(3)
end program
This is a variant on the original program, but breaking the steps of the main equation into single operators.
It shows as noted in an earlier post on RNG's today that you cannot use O2 with IFX. There is no doubt.
I played with this for a few days, with all sorts of variants, but you cannot fix the base error.
 Subscribe to RSS Feed
 Mark Topic as New
 Mark Topic as Read
 Float this Topic for Current User
 Bookmark
 Subscribe
 Printer Friendly Page
 « Previous

 1
 2
 Next »