- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Here is a very short program that uses two routines from Lapack:

```
program tdlamc2
implicit none
integer beta, it, imin, imax, irnd
double precision eps, rmin, rmax
logical lrnd
call dlamc2( beta, it, lrnd, eps, imin, rmin, imax, rmax )
if( lrnd ) then
irnd = 1
eps = ( beta**( 1d0-it ) ) / 2
else
irnd = 0
eps = beta**( 1d0-it )
end if
print 77, beta,it,irnd,imin,imax
77 format('base,it,irnd, imin,imax: ',5I8)
print 78,eps,rmin,rmax
78 format('eps,rmin,rmax: ',3ES20.12)
end program
```

The two Lapack files needed are: dlamch.f and lsame.f . Building and running with optimization disabled produces the correct result (with ifx as well as ifort):

```
ifx /MD /Od /traceback tdlamc2.f90 dlamch.f lsame.f
>tdlamc2
base,it,irnd, imin,imax: 2 53 1 -1021 1024
eps,rmin,rmax: 1.110223024625E-16 2.225073858507-308 1.797693134862+308
```

Changing the optimization level to /O2 produces EXEs that hang (with ifx as well as ifort).

I used the recently released version of the compiler (20230320) on Windows 11 64-bit.

[NOTE: The routine DLAMC2 is available in MKL, but using that gives incorrect results, as I have reported in a separate post on the MKL forum.]

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

@Ron_Green, it's my turn to apologize! While searching for more information about DLAMCH and its siblings, I found the following warning in the Lapack FAQ:

*The routine dlamch.f (and its dependent subroutines dlamc1, dlamc2, dlamc3, dlamc4, dlamc5) MUST be compiled without optimization. If you downloaded the entire lapack distribution this will be taken care of by the LAPACK/SRC/Makefile. However, if you downloaded a specific LAPACK routine plus dependencies, you need to take care that slamch.f (if you downloaded a single precision real or single precision complex routine) or dlamch.f (if you downloaded a double precision real or double precision complex routine) has been included.*

Perhaps this is the same caution that, not having been placed in the source file in plain sight, was not recognized when MKL was built, and that lapse caused the DLAMCH in MKL to give incorrect results. My attempt to mend matters by downloading individual Lapack source files and compiling them with optimizations turned on probably made things worse.

I did file a bug report in the MKL forum, before I started this thread, but it has not been answered yet. Some coordination between the Intel compiler and MKL teams seems to be required to make the necessary corrections to the DLAMCH within MKL.

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

add

/warn:interfaces /standard-semantics /fp-model:precise

and try again. Also, so I don't have to waste 10 minutes finding dlamch.f and lsame.f, if the above does not help please attach those 2 source files.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I provided links to the two Netlib source files concerned (need to copy/paste, but that is how they provide the Lapack sources of individual routines), and I have attached a Zip file containing all three files below.

The bug (program hangs) does not occur if the /fp:precise option is used. Thanks.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

@mecej4 I apologize, I didn't read your first post carefully. I'll try to unpeel the PRECISE options to find the specific options needed. I suspect flush to zero, maybe precise-divide. Usually one of those 2. NO-FTZ if denormals are needed to converge. precise divide because non-precise is sloppy IMHO. Just my first 2 guesses.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The plot thickens - I didn't expect floating overflow. Yes, this is going to need a deeper look.

$ ifort -g -O2 -xhost -fpe0 -traceback tdlamc2.f90 dlamchf77.f lsame.f -o tdlamc2

rwgreen@orcsle153:~/quad/tnew$ ./tdlamc2

forrtl: error (72): floating overflow

Image PC Routine Line Source

libc.so.6 00007F6CFB23EA30 Unknown Unknown Unknown

tdlamc2 0000000000404870 dlamc2_ 329 dlamchf77.f

tdlamc2 0000000000404258 MAIN__ 7 tdlamc2.f90

tdlamc2 00000000004041BD Unknown Unknown Unknown

libc.so.6 00007F6CFB229510 Unknown Unknown Unknown

libc.so.6 00007F6CFB2295C9 __libc_start_main Unknown Unknown

tdlamc2 00000000004040D5 Unknown Unknown Unknown

Aborted (core dumped)

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The line number reported as the location of the FP overflow is not plausible, since the code on line 330 just increments an integer variable, "lt".

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

No problem Ron, and I wish to point out that the reason that I chose to report the issue as a suspected optimizer bug is that it took a considerable effort to isolate the cause of the program hangs.

The actual code where the issue occurred is a numerical nonlinear optimization program with about 18,000 lines of code plus Lapack and BLAS. I tried other compilers, and there were no hangs, but their runs were rather slow (about 30 seconds instead of less than 10 seconds with Ifort when it did not cause a program hang). One expects the possibility of incorrect/inaccurate results when optimization options are on, but hangs are tough to overcome -- no hang occurred if /Zi or /Zd options were used in an attempt to narrow down the area of code causing the hang.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

OK, so you can use

-fp-model source

along with optimizations. I would not call this an optimizer bug. Rather, looks like a recurring issue we see with legacy code. This loop, see what you think:

```
326 *+ WHILE( C.EQ.ONE )LOOP
327 30 CONTINUE
328 IF( c.EQ.one ) THEN
329 lt = lt + 1
330 a = a*lbeta
331 c = dlamc3( a, one )
332 c = dlamc3( c, -a )
333 GO TO 30
334 END IF
335 *+ END WHILE
```

I think you may see the problem with comparing C to ONE with .EQ.. Probably should be a check if they are close near epsilon. Also the initializations of these double precision vars. Right above this loop, look how REAL(8) C, A, LT are initialized:

```
322 lt = 0
323 a = 1
324 c = 1
325 *
326 *+ WHILE( C.EQ.ONE )LOOP
327 30 CONTINUE
328 IF( c.EQ.one ) THEN
329 lt = lt + 1
330 a = a*lbeta
331 c = dlamc3( a, one )
332 c = dlamc3( c, -a )
333 GO TO 30
334 END IF
335 *+ END WHILE
```

You can backtrace ONE to see it's initialization also. a = 1 as an example:

With -fpe0 I saw LT overflow. obviously the c.eq.one never happened, hence the "hang"

I think the -fp-model source is a good fix without rewriting this routine to be numerically safe to modern standards.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

With the original code (using WHILE), compiled with optimizations .and. debug info...

When the loop hangs, issue a Pause (break into the program).

Then open a disassembly window, focus to that window, and step through the loop.

This should expose the issue as to if it is within function dlamc3 or if the C in the WHILE test is different from the C=.

Now I know the source code shows that they ought to be the same, but we are looking for (eliminating) potential optimization bug.

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

@Ron wrote:

obviously the c.eq.one never happened, hence the "hang"

??? From the Fortran code, I have to conclude that the double precision variable C was initialized to 1.0D0. For the loop to run for ever, C would have to stay unchanged, at 1.0D0, i.e. c.eq.one remained .TRUE. all the time.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I have never knowingly used LAPACK, but looking at the code for this routine, the use of FIRST shows the writers made a clever use of the save function, it was not obvious on the first 20 reads, but I got it, but then used C eq to one as a true statement.

The first was not obvious to a casual user and the second could have used logicals and not relied on one being one in Fortran. I would suggest it was not written by one person.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

In Fortran 77, given the limitations of the DATA statement, the use of a saved logical or integer variable to control an IF..ENDIF block was a standard (the only?) way to initialize a variable with the value of an expression with an exact and unchanging value and avoid calculating that expression more than once. Here is an example that you are likely to find even in modern codes (F95, F2003, etc.)

```
logical, save :: begin = .true.
double precision, save :: pi
..
if(begin)then
begin = .false.
pi = 4d0*atan(1d0)
endif
```

More recent revisions of the Fortran standard have allowed more intrinsic functions to be used in initialization expressions, so you are now allowed to write

`double precision, save :: pi = 4d0*atan(1d0)`

There are also newer intrinsic modules that provide definitions for constants such as **π**, so all that is needed is a USE statement.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Ah, another mistake on my part. I'll revisit this later in the week when I have less interrupts. For now -fp-model precise or source is a reasonable work around. I'll need time to try gfortran with max optimizations and see what it does with this code. Maybe something in our optimizer. Maybe. just may be overly aggressive. I can try some older iforts as well.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Interesting , gfortran with -ffast-math also hangs

```
gfortran -O2 tdlamc2.f90 dlamchf77.f lsame.f -o tdlamc2 -ffast-math
~/quad/tnew$ ./tdlamc2
^C
~/quad/tnew$ gfortran --version
GNU Fortran (GCC) 12.0.1 20220413 (Red Hat 12.0.1-0)
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

FMA is at play here as well, at least for the hang. Note that RMAX is infinity however. So other things at play in the optimizations.

```
ifort -g -O2 -xhost -traceback tdlamc2.f90 dlamchf77.f lsame.f -o tdlamc2 -no-fma
rwgreen@orcsle153:~/quad/tnew$ ./tdlamc2
base,it,irnd, imin,imax: 2 53 0 -1021 1025
eps,rmin,rmax: 2.220446049250E-16 2.225073858507-308 Infinity
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

@Ron_Green, it's my turn to apologize! While searching for more information about DLAMCH and its siblings, I found the following warning in the Lapack FAQ:

*The routine dlamch.f (and its dependent subroutines dlamc1, dlamc2, dlamc3, dlamc4, dlamc5) MUST be compiled without optimization. If you downloaded the entire lapack distribution this will be taken care of by the LAPACK/SRC/Makefile. However, if you downloaded a specific LAPACK routine plus dependencies, you need to take care that slamch.f (if you downloaded a single precision real or single precision complex routine) or dlamch.f (if you downloaded a double precision real or double precision complex routine) has been included.*

Perhaps this is the same caution that, not having been placed in the source file in plain sight, was not recognized when MKL was built, and that lapse caused the DLAMCH in MKL to give incorrect results. My attempt to mend matters by downloading individual Lapack source files and compiling them with optimizations turned on probably made things worse.

I did file a bug report in the MKL forum, before I started this thread, but it has not been answered yet. Some coordination between the Intel compiler and MKL teams seems to be required to make the necessary corrections to the DLAMCH within MKL.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Like you, it would not occur to me to check the LAPACK faqs. Indeed that should be in the source header.

I will check in with a colleague on MKL to follow up on your post. Their forum is not as well monitored as ours.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The MKL team is actually investigating already - good! Bug ID with them is MKLD-15217

I added your find in the FAQs to the bug report. Thank you for finding this, it's a huge help!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

This brings back memories. I recall issues with the optimization of DLAMCH with early releases (c1993) of LAPACK on VAXen and DECstation 3100 workstations.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The version of dlamch.f included in Lapack 3.11 gets all the information it needs using the standard intrinsic functions EPSILON(), TINY(), DIGITS(), etc., instead of tricky code that does not work well with optimization .

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Yes. The version using intrinsic functions version is much better - eliminating recurring issues. I had forgotten I had switched to that code eons ago.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page