- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I wrote a driver program to solve the SNLS (Separable Nonlinear Least Squares) problems at http://www.itl.nist.gov/div898/strd/nls/nls_main.shtml using the JPL math77 library (high quality, recently made available free of cost at NetLib, see http://www.netlib.org/math/index.html ). When testing this driver with Intel Fortran, I discovered an optimizer bug.
The optimizer bug occurs with the 15 and 16.0.1 IFort Linux compilers when generating 32-bit targets with -O2. When one of the math77 library files (divset.f) is split into separate files using fsplit, and the split source files are used instead of divset.f, the bug goes away. I have not been able to create a short reproducer that retains the optimizer bug, but to make it simple to reproduce the problem I have collected all the files needed in the attached archive. I have included a copy of the math77 copyright notice in the archive; you can also see the notice at http://www.netlib.org/math/license.html .
Here are the steps to reproduce the problem (I used an OpenSuse 13.2 system with a T4300 CPU). Extract the archive in a test directory in an IFort 32-bit shell window. Download the test data file http://www.itl.nist.gov/div898/strd/nls/data/LINKS/DATA/BoxBOD.dat . Build and run as follows:
$ ifort -O2 *.f90 *.f $ echo BoxBod.dat | ./a.out
The output shows a failure:
... ***** SINGULAR CONVERGENCE ***** Termination code 7 ...
If we compile instead using the split files as follows:
$ ifort -O2 *.f90 dnlsfu.f drn2g.f drnsg.f idsm.f dq7rfh.f DIVS/*.f
the output is correct and as expected:
... Termination code 4 Final parameters ...
The output is also correct if -O0 is used or if 64-bit target mode is used.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Perhaps you could argue that -check or -warn-interfaces should alert you if you haven't set -assume dummy_aliases. I've heard long time Fortran users argue that this violation of Fortran standard shouldn't break their code, regardless of their option setting. As -warn-interfaces may not catch anything which Fortran standard doesn't require being caught by explicit interface, it may take a language lawyer to assess that.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have been able to narrow down the problem a bit.
The file divset.f contains 28 subprograms (22 subroutines and 6 functions), one of which is a 8-line subroutine, D2AXY, which performs the well known Y := a.X + Y operation.
If the source lines that make up D2AXY are put into a separate file, the name of the subroutine changed to xD2AXY in divset.f and the program rebuilt using the extra file just created, the optimizer bug goes away. The subroutine D2AXY is called from other routines contained within divset.f, so I wonder if something goes during the IPO phase.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Perhaps you could argue that -check or -warn-interfaces should alert you if you haven't set -assume dummy_aliases. I've heard long time Fortran users argue that this violation of Fortran standard shouldn't break their code, regardless of their option setting. As -warn-interfaces may not catch anything which Fortran standard doesn't require being caught by explicit interface, it may take a language lawyer to assess that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim P., thanks for your response, and I appreciate your advice, since you are a known expert on optimization and processor options.
Indeed, I was concerned about aliasing and had added -falias to -O2 before you replied and found no effect on the output. There are several compiler options related to aliasing, and it is clear that using any "yes, assuming alias is present" options has the effect of inhibiting some optimizations. Looking at the IFort manual concerning -falias, -assume dummy_aliases, -ansi-alias, -fno-fnalias, -common-args tells me little about what assumptions are made when optimization is wanted and requested.
The code at issue is just this:
SUBROUTINE DV2AXY(P, W, A, X, Y)
C
C *** SET W = A*X + Y -- W, X, Y = P-VECTORS, A = SCALAR ***
C
c ------------------------------------------------------------------
INTEGER P, I
DOUBLE PRECISION W(P), X(P), Y(P), A
c
DO 10 I = 1, P
10 W(I) = A*X(I) + Y(I)
RETURN
END
I instrumented the routine to check for overlaps between the output vector W and one of (A, X, Y). There are few calls with no overlap. However, in all the other cases, W coincided with X or Y, and I told myself that there was no harmful loop-dependency here. In other words, there is aliasing (and a violation of the strict F77 "ANSI" aliasing rule), but it is always "benign aliasing" when it occurs. In none of the calls did A overlap W.
Interestingly, adding lines of code to check addresses for overlap always inhibited optimizations to some extent, so there is a small probability that there were harmful overlaps in the uninstrumented code.
I read http://www.cs.uofs.edu/~mccloske/courses/se504/subprograms_denman_lec.html, and based on all this I still suspect an optimizer bug. Further comments and suggestions (or explanations of why loop optimization is not possible) will be appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You've verified that the aliasing doesn't break the code even under optimization until inlining occurs. It seems the compiler may reorder array access in the caller on the assumption that x and y are intent(in) and not modified, even though it would be possible to check.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Intel compiler does not generate code to check aliasing. It assumes that you have followed the standard's rules regarding aliasing unless you say -assume dummy_aliases. You can use TARGET selectively to prevent the compiler from assuming lack of aliasing. Tim is probably right that inlining opened up additional optimization opportunities.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve Lionel (Intel) wrote:
You can use TARGET selectively to prevent the compiler from assuming lack of aliasing.
Would that be specified in the caller rather than in the called subroutine? I ask for confirmation because I suspect that inlining blurs the distinction between the two in the compiled code, and you used the word "selectively".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I agree that the descriptions of -falias and -ansi-alias are inadequate. These really come from the C world (especially -ansi-alias) and deal with pointers. Only -assume dummy_alias affects how the compiler treats dummy arguments, as best as I can tell.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A couple of days later, I stumbled on a twisted method for trapping calls with aliased input and output arguments. Using the code in #4 as example, note that if the output array W is aliased with the input arrays X or Y erroneous results can occur. I wanted to find the places where the aliased calls were made and fix the problem there, rather than make the subroutine sub-optimal for all calls.
The trick is to add the INTENT(OUT) attribute to the output array W in the subroutine, and compile with -ftrapuv, etc. As the standard states, W becomes undefined when DV2AXY is entered. If W is aliased to X, when values of X are used they will be trapped as uninitialized, because the combination of aliasing and -ftrapuv causes X also to be undefined. In other words, as soon as the subroutine is entered the aliased input array becomes undefined and the actual contents of X will be overwritten totally. At this point a traceback will be printed, so the specific caller is quickly tracked down.
This method could be used to provide a "-check aliased_arguments" compiler option to users who need it.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page