Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Suspected optimizer bug

mecej4
Black Belt
1,528 Views

Here is a very short program that uses two routines from Lapack:

 

 

program tdlamc2
   implicit none
   integer beta, it, imin, imax, irnd
   double precision eps, rmin, rmax
   logical lrnd

   call dlamc2( beta, it, lrnd, eps, imin, rmin, imax, rmax )
   if( lrnd ) then
      irnd = 1
      eps = ( beta**( 1d0-it ) ) / 2
   else
      irnd = 0
      eps = beta**( 1d0-it )
   end if
   print 77, beta,it,irnd,imin,imax
77 format('base,it,irnd,  imin,imax: ',5I8)
   print 78,eps,rmin,rmax
78 format('eps,rmin,rmax: ',3ES20.12)
end program

 

 

The two  Lapack files needed are: dlamch.f and lsame.f  . Building and running with optimization disabled produces the correct result (with ifx as well as ifort):

 

 

ifx /MD /Od /traceback tdlamc2.f90 dlamch.f lsame.f
>tdlamc2
base,it,irnd,  imin,imax:        2      53       1   -1021    1024
eps,rmin,rmax:   1.110223024625E-16  2.225073858507-308  1.797693134862+308

 

 

Changing the optimization level to /O2 produces EXEs that hang (with ifx as well as ifort).

I used the recently released version of the compiler (20230320) on Windows 11 64-bit.

[NOTE: The routine DLAMC2 is available in MKL, but using that gives incorrect results, as I have reported in a separate post on the MKL forum.]

0 Kudos
1 Solution
mecej4
Black Belt
1,393 Views

@Ron_Green, it's my turn to apologize! While searching for more information about DLAMCH and its siblings, I found the following warning in the Lapack FAQ:

The routine dlamch.f (and its dependent subroutines dlamc1, dlamc2, dlamc3, dlamc4, dlamc5) MUST be compiled without optimization. If you downloaded the entire lapack distribution this will be taken care of by the LAPACK/SRC/Makefile. However, if you downloaded a specific LAPACK routine plus dependencies, you need to take care that slamch.f (if you downloaded a single precision real or single precision complex routine) or dlamch.f (if you downloaded a double precision real or double precision complex routine) has been included.

Perhaps this is the same caution that, not having been placed in the source file in plain sight, was not recognized when MKL was built, and that lapse caused the DLAMCH in MKL to give incorrect results. My attempt to mend matters by downloading individual Lapack source files and compiling them with optimizations turned on probably made things worse.

I did file a bug report in the MKL forum, before I started this thread, but it has not been answered yet. Some coordination between the Intel compiler and MKL teams seems to be required to make the necessary corrections to the DLAMCH within MKL.

View solution in original post

20 Replies
Ron_Green
Moderator
1,475 Views

add

/warn:interfaces  /standard-semantics /fp-model:precise

 

and try again.  Also, so I don't have to waste 10 minutes finding dlamch.f and lsame.f, if the above does not help please attach those 2 source files.

0 Kudos
mecej4
Black Belt
1,468 Views

I provided links to the two Netlib source files concerned (need to copy/paste, but that is how they provide the Lapack sources of individual routines), and I have attached a Zip file containing all three files below.

The bug (program hangs) does not occur if the /fp:precise option is used. Thanks.

0 Kudos
Ron_Green
Moderator
1,466 Views

@mecej4 I apologize, I didn't read your first post carefully.  I'll try to unpeel the PRECISE options to find the specific options needed.  I suspect flush to zero, maybe precise-divide.  Usually one of those 2.  NO-FTZ if denormals are needed to converge.  precise divide because non-precise is sloppy IMHO.   Just my first 2 guesses.

 

0 Kudos
Ron_Green
Moderator
1,457 Views

The plot thickens - I didn't expect floating overflow.  Yes, this is going to need a deeper look.

 

$ ifort -g -O2 -xhost -fpe0 -traceback tdlamc2.f90 dlamchf77.f lsame.f -o tdlamc2

rwgreen@orcsle153:~/quad/tnew$ ./tdlamc2

forrtl: error (72): floating overflow

Image              PC                Routine            Line        Source             

libc.so.6          00007F6CFB23EA30  Unknown               Unknown  Unknown

tdlamc2            0000000000404870  dlamc2_                   329  dlamchf77.f

tdlamc2            0000000000404258  MAIN__                      7  tdlamc2.f90

tdlamc2            00000000004041BD  Unknown               Unknown  Unknown

libc.so.6          00007F6CFB229510  Unknown               Unknown  Unknown

libc.so.6          00007F6CFB2295C9  __libc_start_main     Unknown  Unknown

tdlamc2            00000000004040D5  Unknown               Unknown  Unknown

Aborted (core dumped)

0 Kudos
mecej4
Black Belt
1,444 Views

The line number reported as the location of the FP overflow is not plausible, since the code on line 330 just increments an integer variable, "lt".

0 Kudos
mecej4
Black Belt
1,457 Views

No problem Ron, and I wish to point out that the reason that I chose to report the issue as a suspected optimizer bug is that it took a considerable effort to isolate the cause of the program hangs.

The actual code where the issue occurred is a numerical nonlinear optimization program with about 18,000 lines of code plus Lapack and BLAS. I tried other compilers, and there were no hangs, but their runs were rather slow (about 30 seconds instead of less than 10 seconds with Ifort when it did not cause a program hang). One expects the possibility of incorrect/inaccurate results when optimization options are on, but hangs are tough to overcome -- no hang occurred if /Zi or /Zd options were used in an attempt to narrow down the area of code causing the hang.

0 Kudos
Ron_Green
Moderator
1,454 Views

OK, so you can use

-fp-model source

 

along with optimizations.  I would not call this an optimizer bug.  Rather, looks like a recurring issue we see with legacy code.  This loop, see what you think:

 

326 *+       WHILE( C.EQ.ONE )LOOP
327    30    CONTINUE
328          IF( c.EQ.one ) THEN
329             lt = lt + 1
330             a = a*lbeta
331             c = dlamc3( a, one )
332             c = dlamc3( c, -a )
333             GO TO 30
334          END IF
335 *+       END WHILE

 

I think you may see the problem with comparing C to ONE with .EQ..  Probably should be a check if they are close near epsilon.  Also the initializations of these double precision vars.  Right above this loop, look how REAL(8) C, A, LT are initialized:

322          lt = 0
323          a = 1
324          c = 1
325 *        
326 *+       WHILE( C.EQ.ONE )LOOP
327    30    CONTINUE
328          IF( c.EQ.one ) THEN
329             lt = lt + 1
330             a = a*lbeta
331             c = dlamc3( a, one )
332             c = dlamc3( c, -a )
333             GO TO 30
334          END IF
335 *+       END WHILE

You can backtrace ONE to see it's initialization also.  a = 1 as an example: 
With -fpe0 I saw LT overflow.  obviously the c.eq.one never happened, hence the "hang"

 

I think the -fp-model source is a good fix without rewriting this routine to be numerically safe to modern standards.

jimdempseyatthecove
Black Belt
1,425 Views

With the original code (using WHILE), compiled with optimizations .and. debug info...

When the loop hangs, issue a Pause (break into the program).

Then open a disassembly window, focus to that window, and step through the loop.

This should expose the issue as to if it is within function dlamc3 or if the C in the WHILE test is different from the C=.

Now I know the source code shows that they ought to be the same, but we are looking for (eliminating) potential optimization bug.

Jim Dempsey

 

0 Kudos
mecej4
Black Belt
1,434 Views

@Ron wrote:

        obviously the c.eq.one never happened, hence the "hang"

??? From the Fortran code, I have to conclude that the double precision variable C was initialized to 1.0D0. For the loop to run for ever, C would have to stay unchanged, at 1.0D0, i.e. c.eq.one remained .TRUE. all the time.

0 Kudos
JohnNichols
Valued Contributor III
1,281 Views

I have never knowingly used LAPACK,  but looking at the code for this routine, the use of FIRST shows the writers made a clever use of the save function, it was not obvious on the first 20 reads, but I got it, but then used C eq to one as a true statement.  

The first was not obvious to a casual user and the second could have used logicals and not relied on one being one in Fortran.  I would suggest it was not written by one person. 

 

0 Kudos
mecej4
Black Belt
1,272 Views

In Fortran 77, given the limitations of the DATA statement, the use of a saved logical or integer variable to control an IF..ENDIF block was a standard (the only?) way to initialize a variable with the value of an expression with an exact and unchanging value and avoid calculating that expression more than once. Here is an example that you are likely to find even in modern codes (F95, F2003, etc.)

logical, save :: begin = .true.
double precision, save :: pi
..
if(begin)then
   begin = .false.
   pi = 4d0*atan(1d0)
endif

More recent revisions of the Fortran standard have allowed more intrinsic functions to be used in initialization expressions, so you are now allowed to write

double precision, save :: pi = 4d0*atan(1d0)

There are also newer intrinsic modules that provide definitions for constants such as π, so all that is needed is a USE statement.

0 Kudos
Ron_Green
Moderator
1,426 Views

Ah, another mistake on my part.  I'll revisit this later in the week when I have less interrupts.  For now -fp-model precise or source is a reasonable work around.  I'll need time to try gfortran with max optimizations and see what it does with this code.  Maybe something in our optimizer.  Maybe.  just may be overly aggressive.  I can try some older iforts as well. 

0 Kudos
Ron_Green
Moderator
1,424 Views

Interesting , gfortran with -ffast-math also hangs

gfortran -O2 tdlamc2.f90 dlamchf77.f lsame.f -o tdlamc2 -ffast-math
~/quad/tnew$ ./tdlamc2
^C
~/quad/tnew$ gfortran --version
GNU Fortran (GCC) 12.0.1 20220413 (Red Hat 12.0.1-0)
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

0 Kudos
Ron_Green
Moderator
1,412 Views

FMA is at play here as well, at least for the hang.  Note that RMAX is infinity however.  So other things at play in the optimizations.

 

ifort -g -O2 -xhost -traceback tdlamc2.f90 dlamchf77.f lsame.f -o tdlamc2 -no-fma
rwgreen@orcsle153:~/quad/tnew$ ./tdlamc2 
base,it,irnd,  imin,imax:        2      53       0   -1021    1025
eps,rmin,rmax:   2.220446049250E-16  2.225073858507-308            Infinity

 

0 Kudos
mecej4
Black Belt
1,394 Views

@Ron_Green, it's my turn to apologize! While searching for more information about DLAMCH and its siblings, I found the following warning in the Lapack FAQ:

The routine dlamch.f (and its dependent subroutines dlamc1, dlamc2, dlamc3, dlamc4, dlamc5) MUST be compiled without optimization. If you downloaded the entire lapack distribution this will be taken care of by the LAPACK/SRC/Makefile. However, if you downloaded a specific LAPACK routine plus dependencies, you need to take care that slamch.f (if you downloaded a single precision real or single precision complex routine) or dlamch.f (if you downloaded a double precision real or double precision complex routine) has been included.

Perhaps this is the same caution that, not having been placed in the source file in plain sight, was not recognized when MKL was built, and that lapse caused the DLAMCH in MKL to give incorrect results. My attempt to mend matters by downloading individual Lapack source files and compiling them with optimizations turned on probably made things worse.

I did file a bug report in the MKL forum, before I started this thread, but it has not been answered yet. Some coordination between the Intel compiler and MKL teams seems to be required to make the necessary corrections to the DLAMCH within MKL.

Ron_Green
Moderator
1,385 Views

Like you, it would not occur to me to check the LAPACK faqs.  Indeed that should be in the source header.

 

I will check in with a colleague on MKL to follow up on your post.  Their forum is not as well monitored as ours.

0 Kudos
Ron_Green
Moderator
1,381 Views

The MKL team is actually investigating already - good!  Bug ID with them is MKLD-15217

I added your find in the FAQs to the bug report.  Thank you for finding this, it's a huge help!

David_Billinghurst
New Contributor III
1,353 Views

This brings back memories.  I recall issues with the optimization of DLAMCH with early releases (c1993) of LAPACK on VAXen and DECstation 3100 workstations.

0 Kudos
mecej4
Black Belt
1,336 Views

The version of dlamch.f included in Lapack 3.11 gets all the information it needs using the standard intrinsic functions EPSILON(), TINY(), DIGITS(), etc., instead of tricky code that does not work well with optimization .

0 Kudos
David_Billinghurst
New Contributor III
1,325 Views

Yes.  The version using intrinsic functions version is much better - eliminating recurring issues.  I had forgotten I had switched to that code eons ago.

0 Kudos
Reply