Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Strange optimization behavior

netphilou31
New Contributor II
543 Views

Dear All,

I am currently working on an existing code (generating a dll) to make it thread safe. I have removed all existing common and global variables by moving them into an external structure which is systematically passed as an argument from external calls. During this work, I found a strange behavior when switching from the default local storage option to /Qauto for the Release configuration (with default optimizations settings). The code does not produce the expected results whereas /Qauto in Debug configuration works. In my tries to solve the problem, I found that switching from /fpsource to /fp:fast=1 or /fp:fast=2 in the floating-point settings seems to solve the issue even if there are small numerical differences in the results with the original code.

My compiler version is Intel® Fortran Compiler Classic 2021.4.0 [IA-32]

My original settings are (who don't produce a working dll):

/nologo /O2 /fpp /DWIN32 /DFRENCH /reentrancy:threaded /extend-source:132 /warn:interfaces /Qauto /assume:byterecl /fpe:1 /fp:source /fpconstant /Qfp-speculation=strict /module:"Win32\Release\\" /object:"Win32\Release\\" /Fd"Win32\Release\vc160.pdb" /libs:static /threads /Qmkl:sequential /assume:noold_maxminloc /c

and the new ones:

/nologo /O2 /fpp /DWIN32 /DFRENCH /reentrancy:threaded /extend-source:132 /warn:interfaces /Qauto /assume:byterecl /fpe:1 /fpconstant /Qfp-speculation=strict /module:"Win32\Release\\" /object:"Win32\Release\\" /Fd"Win32\Release\vc160.pdb" /libs:static /threads /Qmkl:sequential /assume:noold_maxminloc /c

Removing the optimization (i.e. switrching fro /O2 flag to /Od) seems also to work with the original floating point settings.

Even if I think that I probably need also to add the /assume:recursion and/or /Qopenmp flags to make it fully thread safe, any idea about this problem will be welcomed.

Best regards,

P.S: I think I used the /fp:source /fpconstant /Qfp-speculation=strict after reading a paper written by Steve Lionel about the accuracy and reproductivility of results in a floating point environment.

0 Kudos
17 Replies
netphilou31
New Contributor II
493 Views

Hi,

Just to give some complementary information, it seems that the compiler behaves differently in the following situation:

let's assume a calculation routine with the following signature:

subroutine BGGP(T,P,Z,I,GGPI,HGPI,SGPI,DELHVI,VAR,NMAX,IERR)
    integer(4), intent(in)    :: I, NMAX
    real(8),    intent(in)    :: T, P, Z(*)
    real(8),    intent(inout) :: GGPI(*), HGPI(*), SGPI(*), DELHVI(*)
    real(8),    intent(inout) :: VAR(NMAX,*)
    integer(4), intent(out)   :: IERR

and let's define GGPI(), HGPI(), SGPI() and DELHVI() as subarrays of array VAR() (respectively VAR(:,42), VAR(:,12), etc...)

Normally, calling the procedure with:

call BGGP(T,P,Z,I,VARI(1:N,42),VARI(1:N,12),VARI(1:N,17),VARI(1:N,6),VAR,NMAX,IERR)

Should produce the same results in debug and release mode (I am using the default calling convention).

However, using for example both SGPI(I) and VAR(I,17) in the called subroutine to refer the same quantity, seems to generate different results between Debug and Release configurations with /Qauto and /Qopenmp along with /fp:fast=2 settings.

For example (even if I agree it is a strange way of coding)

SGPI(I) = IdealGasEntropy(T,P,Z,...)
VARI(I,17) = VARI(I,17) + S0F(I)

However, calling the routine with:

call BGGP(T,P,Z,I,VARI(1,42),VARI(1,12),VARI(1,17),VARI(1,6),VAR,NMAX,IERR)

seems to produce the same answer.
Unfortunately, until now I was obliged to use the first form of call, because the VAR() array is declared with the pointer attribute:

    real(8), pointer :: VAR(:,:)

I will probably need to change the way I declare the VAR array in order to use an assumed-size array along with a fortran pointer declaration like:

    real(8) :: VAR(NMAX,*)
    pointer(ptrVAR, VAR)

to be able to pass VAR directly as VAR(1,12).

Any idea or explanation of this issue?

Best regards,

mecej4
Black Belt
481 Views

Your call to the subroutine may be violating the anti-aliasing rules, since the argument arrays GGPIHGPISGPIDELHVI are sections of the array VAR, which is also an argument, and all these arrays have INTENT(IN OUT). Without seeing how these arrays are accessed and used in the subroutine, it is not possible to diagnose the problem, but here is what can go wrong. 

Suppose that you change some elements of array GGPI in the subroutine. That will cause the corresponding elements of VAR to change, but "behind the compiler's back". If, within the subroutine, those elements of VAR are used in expressions, the code may use stale values instead. Or, if you intended the original values of VAR to be used, the code may use the new values, instead, against your intentions.

Similarly, changes to elements of VAR may cause corresponding portions of GGPI, etc., to change.

Such errors are more likely to bite when higher levels of optimization are used. They are also unpredictable as to their behavior patterns, which is why the language rules prohibit "changing the values in an argument variable except through its own name".

jimdempseyatthecove
Black Belt
476 Views

It appears that the problem you have is you have an alias between two dummy arguments (one output)...

.AND. the optimized code is (in an offending section of code) vectorizing a section of code with vector dependencies introduced by the alias. 

IOW one of your aliased dummy outputs is/was written before the other aliased output was read.

You will have to identify the offending loop(s) and make use of the following:

!DIR$ NOVECTOR

DO ... ! the offending loop

 

Jim Dempsey

 

netphilou31
New Contributor II
448 Views

Hi mece and Jim,

Thanks for your replies.

To give you the full information, arrays HGPI, SGPI and DELHVI are used to store intermediate results, but it is sometimes more convenient to pass them as array sections of work array VAR to make the code more readable and to add the full array in case we need to access other sections or to pass them in internal calls. The results of the calculations are stored in the array GGPI. However, since I use the default calling convention, arguments are passed by reference, so to take an example VAR(1,17) should have the same memory address as SGPI(1), consequently, updating one of them should update VAR as well. This is the way our code works, and I can tell you that we use this artefact in a lot of places without any problem until I tried to add the /Qauto and /Qopenmp compiler switches. Does it mean that because of /Qauto and/or /Qopenmp, VAR(1:N,17) represents a copy of a section of the array VAR and that they don't share the same memory locations?

@jimdempseyatthecove , I have no loop inside the subroutine (the array index is passed as argument, the loop is outside) and as shown in my code extract, the element SGPI(I) is calculated just before VARI(I,17) is updated (they should have the same memory address).

Best regards,

mecej4
Black Belt
436 Views

From your description, it appears that your code has been written in such a way that it violates the anti-aliasing rules. Please read Dr. Fortran's article on this topic. With such code, which used to work correctly with older compilers, perhaps because they may have done little or no optimization. if fixing the code to make it standard-conforming is not feasible, try using the compiler option /assume:dummy_aliases with those source files that are in violation.

I do find that several of your statements in this thread to be incorrect, such as "VAR(1,17) should have the same memory address as SGPI(1), consequently, updating one of them should update VAR as well. This is the way our code works". To put it bluntly, that code is broken, as it is based on an incorrect assumption.

Please see the short code example (Program ALIAS1)  in the article that I referenced. Note as well the excerpt of the rules that the code violates, which Steve has included as a quotation.

netphilou31
New Contributor II
423 Views

Mece,

After reading Dr. Fortran's post, I understand that depending on compiler optimizations, VARI(I,17) may have been updated before SGPI(I) was calculated, and I concede that I should not have written the code like this (in fact it was simply an oversight when updating an existing section of code). However, what I don't understand is your comment about memory addresses, and why SGPI(1) and VAR(1,17) should not have the same memory locations.

As an example, we have a lot of code (which I am currently updating), where:

VAR(1,3) is passed as actual argument and where the corresponding dummy argument is Z(), then, in the called routine Z() is updated with another array (let say X which can also be a section of VAR), with Z(1:N) = X(1:N) and then it is not Z() which is passed as internal call but VAR(1,3) and this works perfectly!

Even, if this way of coding looks very strange (which I fully agree), the code was written 30 years ago and has worked perfectly over the years with several different compilers (Watcom fortran, Digital visual fortran, COMPAQ and Intel) and OS (unix, dos/windows, vms).

I would understand that passing VAR(1:N,3) would create an internal copy of the section of the VAR array behind the scenes and that it may not share the same memory location as VAR, however I would have expected that at least on exit of the called subroutine, the corresponding section of VAR was updated in some way.

Best regards,

Steve_Lionel
Black Belt Retired Employee
390 Views

@netphilou31 wrote:

Even, if this way of coding looks very strange (which I fully agree), the code was written 30 years ago and has worked perfectly over the years with several different compilers (Watcom fortran, Digital visual fortran, COMPAQ and Intel) and OS (unix, dos/windows, vms).


This means nothing. As compilers get smarter over the years, or simply rearrange the way things are done, errors you previously didn't notice become visible.

netphilou31
New Contributor II
385 Views

I agree Steve, but when you don't know what the compiler does behind the scenes or does differently from one version to the next or even behave differently depending on some switches, it's sometimes difficult (not to say impossible) to understand why a source code you worked with for years suddenly and which contains thousands of lines of code stops working or behave differently. In my case it was the use of the /Qauto switch along with the optimization settings that made the problems visible whereas with the "Default local storage" option nothing appeared.

Best regards,

mecej4
Black Belt
415 Views

On  a modern computer memory is not a single entity, as it was in, say, the 8088. We have RAM, multiple levels of cache memory, and registers. Speed is promoted by keeping variables in the fastest level of memory as long as possible.

Consider a single scalar subroutine argument that gets updated in a subroutine. Copies of that variable may exist in many places, and those copies may have different values. The compiler will manage the updating and refreshing in such a way that the final results are the same as if all the copies were maintained coherent.

If, however, you do CALL SUB(x, x) to subroutine SUB(X, Y), and you did not tell the compiler that X and Y could be one and the same, and you allowed the compiler to optimize, the code in the subroutine may malfunction. Please work through Steve's PROGRAM ALIAS1 to convince yourself on this matter.

Things can get even worse if X has INTENT(IN), and Y has (IN OUT) in the subroutine specification, and you call with (X,X). What would you expect the subroutine to do?

Things get worse when some of these variables are accessed and updated by more than one thread.

I have come across a few instances where software library routines that have been used for decades by thousands of people suddenly started malfunctioning because of such aliasing errors.

An instructive series of posts from 1999 on this topic is worth reading.

netphilou31
New Contributor II
401 Views

I agree with your statement about memory management but I though the compiler was clever enough to avoid these situations. Luckily, I was starting to rewrite these strange parts of code when I faced this issue.

Anyway, it has been a tremendous luck that these codes worked for so long (even in Release configuration with default optimizations) and started to produce unexpected results after adding the /Qauto compiler switch.

jimdempseyatthecove
Black Belt
379 Views

Programs with errors (assumptions) are like walking on thin ice: sometimes you make it, sometimes you don't.

If the compiler is instructed to assume and or all dummy arguments are aliased, and it was attempting to make sense out of the aliasing, the code generated would be non-optimal (as well as possibly incorrect). So, for this file, use /Od and walk on thin ice.

 

An alternative is to

a) write defensive code in the affected procedure with branch a) handling non-aliased arguments and branch b) handling aliased arguments

.or.

b) on CALL to affected procedure, use an array constructor to construct a copy of the offending aliased argument, with the requirement that the called procedure typed it with intent(in). If it is intent(inout) or intent(out) the array constructor method will not work.

 

Jim Dempsey

mecej4
Black Belt
350 Views

Jim's admonishment (regarding continuing to use programs with errors) became highlighted for me yesterday and today.

I was working on an application that performs separable nonlinear regression using routines from the JPL and PORT collections, using about 50 subroutines from the first and about 70 from the second collection. If I compiled with optimization turned on, some test runs failed to give correct results. Fortunately, the incorrect results were so bad that they made me suspect aliasing. I also wondered if there were variables that were used before initialization, and variables that needed to be saved but were not, and that notion took some time to chase down and dismiss. It was also annoying that changing compiler options or using a different compiler sometimes yielded an EXE that produced the correct results.

It took quite a bit of effort to track down the error, since these old Fortran 77 codes are not easy to work with in the debugger. There were two instances of aliasing errors that I was able to pinpoint as the source of the errors, and I describe one of them here. Most of these codes are about forty years old.

The subprogram in question has the following argument list:

 

DOUBLE PRECISION FUNCTION DL7SVX(P, L, X, Y)
INTEGER P
DOUBLE PRECISION L(P*(P+1)/2), X(P), Y(P)

 

Comments in the source code state: "THE CALLER MAY PASS THE SAME VECTOR  FOR X AND Y (NONSTANDARD FORTRAN USAGE), IN WHICH CASE X OVERWRITES Y."

There are four instances in other library subroutines in which this function is invoked, and in every one of them, the last two arguments are identical to each other:

 

T:\port\ORIG>grep -in "=.*dl7svx" *.f
dg7lit.f:736:      T = T / DL7SVX(P, V(LMAT1), V(STEP1), V(STEP1))
drnsg.f:257:         T = DL7SVX(K, V(AR1), C, C)
drnsg.f:425:      T = T / DL7SVX(PP, V(K), V(TEMP1), V(TEMP1))
drnsgb.f:228:         T = DL7SVX(K, V(AR1), C, C)

 

Note that this is not just a function, since it returns the leading singular value of a matrix through the function name and the associated right and left singular vectors through the array arguments X and Y.

 

JohnNichols
Valued Contributor II
336 Views

Randomly pulling a coded routine from Port shows a lot of potential issues in using this code, even allowing for the fact that it has been downloaded by many users.  The count is high. 

1. No implicit none,  if you turn that one on with old code there is a high probability you will get errors even though the code works in spite of the errors. 

2.  Line 202 anything written for an NSF grant is likely to be have been written by graduate students.  Graduate students are trying to finish their degree not be perfectly correct.  I have had 124 grad students I am aware of their skill level.   Supervisors look for gross errors in logic. 

3. Line 270 -- ok so even God does not know where all these go and testing this will be a nightmare.  

4. There is a famous paper by a grad student. Absolutely brilliant Chinese engineer, he/she looked at the development of BEM from 1970 to about 1996 when the paper was written, his/her supervisor basically invented BEM.  He/she went through and demolished the research work of some very highly regarded academic BEM modelers.  It was ruthless.  I laughed, they would not have, my guess is the supervisor was issuing a course correction.  I would not want to have been that grad student trying to become published.  I should check on his/her career.  Peer review ultimately works, but it can take decades and sometimes mistakes slip through, I mean we did not lose a satellite on mars because of an error in g, actually .. 

 

If you do not have the skill set of a @mecej4 - you should avoid this code at all costs.  @mecej4  can handle this - the average user, no.  

 

This is a personal opinion, others may disagree,  I understand I have often stepped into the unknown, luckily the likes of @mecej4 , @jimdempseyatthecove  and @Steve_Lionel  have stopped my worst errors for that I am very grateful.  

 

SUBROUTINE A7SST(IV, LIV, LV, V)
C
C  ***  ASSESS CANDIDATE STEP (***SOL VERSION 2.3)  ***
C
      INTEGER LIV, LV
      INTEGER IV(LIV)
      REAL V(LV)
C
C  ***  PURPOSE  ***
C
C        THIS SUBROUTINE IS CALLED BY AN UNCONSTRAINED MINIMIZATION
C     ROUTINE TO ASSESS THE NEXT CANDIDATE STEP.  IT MAY RECOMMEND ONE
C     OF SEVERAL COURSES OF ACTION, SUCH AS ACCEPTING THE STEP, RECOM-
C     PUTING IT USING THE SAME OR A NEW QUADRATIC MODEL, OR HALTING DUE
C     TO CONVERGENCE OR FALSE CONVERGENCE.  SEE THE RETURN CODE LISTING
C     BELOW.
C
C--------------------------  PARAMETER USAGE  --------------------------
C
C  IV (I/O) INTEGER PARAMETER AND SCRATCH VECTOR -- SEE DESCRIPTION
C             BELOW OF IV VALUES REFERENCED.
C LIV (IN)  LENGTH OF IV ARRAY.
C  LV (IN)  LENGTH OF V ARRAY.
C   V (I/O) REAL PARAMETER AND SCRATCH VECTOR -- SEE DESCRIPTION
C             BELOW OF V VALUES REFERENCED.
C
C  ***  IV VALUES REFERENCED  ***
C
C    IV(IRC) (I/O) ON INPUT FOR THE FIRST STEP TRIED IN A NEW ITERATION,
C             IV(IRC) SHOULD BE SET TO 3 OR 4 (THE VALUE TO WHICH IT IS
C             SET WHEN STEP IS DEFINITELY TO BE ACCEPTED).  ON INPUT
C             AFTER STEP HAS BEEN RECOMPUTED, IV(IRC) SHOULD BE
C             UNCHANGED SINCE THE PREVIOUS RETURN OF A7SST.
C                ON OUTPUT, IV(IRC) IS A RETURN CODE HAVING ONE OF THE
C             FOLLOWING VALUES...
C                  1 = SWITCH MODELS OR TRY SMALLER STEP.
C                  2 = SWITCH MODELS OR ACCEPT STEP.
C                  3 = ACCEPT STEP AND DETERMINE V(RADFAC) BY GRADIENT
C                       TESTS.
C                  4 = ACCEPT STEP, V(RADFAC) HAS BEEN DETERMINED.
C                  5 = RECOMPUTE STEP (USING THE SAME MODEL).
C                  6 = RECOMPUTE STEP WITH RADIUS = V(LMAXS) BUT DO NOT
C                       EVALUATE THE OBJECTIVE FUNCTION.
C                  7 = X-CONVERGENCE (SEE V(XCTOL)).
C                  8 = RELATIVE FUNCTION CONVERGENCE (SEE V(RFCTOL)).
C                  9 = BOTH X- AND RELATIVE FUNCTION CONVERGENCE.
C                 10 = ABSOLUTE FUNCTION CONVERGENCE (SEE V(AFCTOL)).
C                 11 = SINGULAR CONVERGENCE (SEE V(LMAXS)).
C                 12 = FALSE CONVERGENCE (SEE V(XFTOL)).
C                 13 = IV(IRC) WAS OUT OF RANGE ON INPUT.
C             RETURN CODE I HAS PRECEDENCE OVER I+1 FOR I = 9, 10, 11.
C IV(MLSTGD) (I/O) SAVED VALUE OF IV(MODEL).
C  IV(MODEL) (I/O) ON INPUT, IV(MODEL) SHOULD BE AN INTEGER IDENTIFYING
C             THE CURRENT QUADRATIC MODEL OF THE OBJECTIVE FUNCTION.
C             IF A PREVIOUS STEP YIELDED A BETTER FUNCTION REDUCTION,
C             THEN IV(MODEL) WILL BE SET TO IV(MLSTGD) ON OUTPUT.
C IV(NFCALL) (IN)  INVOCATION COUNT FOR THE OBJECTIVE FUNCTION.
C IV(NFGCAL) (I/O) VALUE OF IV(NFCALL) AT STEP THAT GAVE THE BIGGEST
C             FUNCTION REDUCTION THIS ITERATION.  IV(NFGCAL) REMAINS
C             UNCHANGED UNTIL A FUNCTION REDUCTION IS OBTAINED.
C IV(RADINC) (I/O) THE NUMBER OF RADIUS INCREASES (OR MINUS THE NUMBER
C             OF DECREASES) SO FAR THIS ITERATION.
C IV(RESTOR) (OUT) SET TO 1 IF V(F) HAS BEEN RESTORED AND X SHOULD BE
C             RESTORED TO ITS INITIAL VALUE, TO 2 IF X SHOULD BE SAVED,
C             TO 3 IF X SHOULD BE RESTORED FROM THE SAVED VALUE, AND TO
C             0 OTHERWISE.
C  IV(STAGE) (I/O) COUNT OF THE NUMBER OF MODELS TRIED SO FAR IN THE
C             CURRENT ITERATION.
C IV(STGLIM) (IN)  MAXIMUM NUMBER OF MODELS TO CONSIDER.
C IV(SWITCH) (OUT) SET TO 0 UNLESS A NEW MODEL IS BEING TRIED AND IT
C             GIVES A SMALLER FUNCTION VALUE THAN THE PREVIOUS MODEL,
C             IN WHICH CASE A7SST SETS IV(SWITCH) = 1.
C IV(TOOBIG) (I/O)  IS NONZERO ON INPUT IF STEP WAS TOO BIG (E.G., IF
C             IT WOULD CAUSE OVERFLOW).  IT IS SET TO 0 ON RETURN.
C   IV(XIRC) (I/O) VALUE THAT IV(IRC) WOULD HAVE IN THE ABSENCE OF
C             CONVERGENCE, FALSE CONVERGENCE, AND OVERSIZED STEPS.
C
C  ***  V VALUES REFERENCED  ***
C
C V(AFCTOL) (IN)  ABSOLUTE FUNCTION CONVERGENCE TOLERANCE.  IF THE
C             ABSOLUTE VALUE OF THE CURRENT FUNCTION VALUE V(F) IS LESS
C             THAN V(AFCTOL) AND A7SST DOES NOT RETURN WITH
C             IV(IRC) = 11, THEN A7SST RETURNS WITH IV(IRC) = 10.
C V(DECFAC) (IN)  FACTOR BY WHICH TO DECREASE RADIUS WHEN IV(TOOBIG) IS
C             NONZERO.
C V(DSTNRM) (IN)  THE 2-NORM OF D*STEP.
C V(DSTSAV) (I/O) VALUE OF V(DSTNRM) ON SAVED STEP.
C   V(DST0) (IN)  THE 2-NORM OF D TIMES THE NEWTON STEP (WHEN DEFINED,
C             I.E., FOR V(NREDUC) .GE. 0).
C      V(F) (I/O) ON BOTH INPUT AND OUTPUT, V(F) IS THE OBJECTIVE FUNC-
C             TION VALUE AT X.  IF X IS RESTORED TO A PREVIOUS VALUE,
C             THEN V(F) IS RESTORED TO THE CORRESPONDING VALUE.
C   V(FDIF) (OUT) THE FUNCTION REDUCTION V(F0) - V(F) (FOR THE OUTPUT
C             VALUE OF V(F) IF AN EARLIER STEP GAVE A BIGGER FUNCTION
C             DECREASE, AND FOR THE INPUT VALUE OF V(F) OTHERWISE).
C V(FLSTGD) (I/O) SAVED VALUE OF V(F).
C     V(F0) (IN)  OBJECTIVE FUNCTION VALUE AT START OF ITERATION.
C V(GTSLST) (I/O) VALUE OF V(GTSTEP) ON SAVED STEP.
C V(GTSTEP) (IN)  INNER PRODUCT BETWEEN STEP AND GRADIENT.
C V(INCFAC) (IN)  MINIMUM FACTOR BY WHICH TO INCREASE RADIUS.
C  V(LMAXS) (IN)  MAXIMUM REASONABLE STEP SIZE (AND INITIAL STEP BOUND).
C             IF THE ACTUAL FUNCTION DECREASE IS NO MORE THAN TWICE
C             WHAT WAS PREDICTED, IF A RETURN WITH IV(IRC) = 7, 8, OR 9
C             DOES NOT OCCUR, IF V(DSTNRM) .GT. V(LMAXS) OR THE CURRENT
C             STEP IS A NEWTON STEP, AND IF
C             V(PREDUC) .LE. V(SCTOL) * ABS(V(F0)), THEN A7SST RETURNS
C             WITH IV(IRC) = 11.  IF SO DOING APPEARS WORTHWHILE, THEN
C             A7SST REPEATS THIS TEST (DISALLOWING A FULL NEWTON STEP)
C             WITH V(PREDUC) COMPUTED FOR A STEP OF LENGTH V(LMAXS)
C             (BY A RETURN WITH IV(IRC) = 6).
C V(NREDUC) (I/O)  FUNCTION REDUCTION PREDICTED BY QUADRATIC MODEL FOR
C             NEWTON STEP.  IF A7SST IS CALLED WITH IV(IRC) = 6, I.E.,
C             IF V(PREDUC) HAS BEEN COMPUTED WITH RADIUS = V(LMAXS) FOR
C             USE IN THE SINGULAR CONVERGENCE TEST, THEN V(NREDUC) IS
C             SET TO -V(PREDUC) BEFORE THE LATTER IS RESTORED.
C V(PLSTGD) (I/O) VALUE OF V(PREDUC) ON SAVED STEP.
C V(PREDUC) (I/O) FUNCTION REDUCTION PREDICTED BY QUADRATIC MODEL FOR
C             CURRENT STEP.
C V(RADFAC) (OUT) FACTOR TO BE USED IN DETERMINING THE NEW RADIUS,
C             WHICH SHOULD BE V(RADFAC)*DST, WHERE  DST  IS EITHER THE
C             OUTPUT VALUE OF V(DSTNRM) OR THE 2-NORM OF
C             DIAG(NEWD)*STEP  FOR THE OUTPUT VALUE OF STEP AND THE
C             UPDATED VERSION, NEWD, OF THE SCALE VECTOR D.  FOR
C             IV(IRC) = 3, V(RADFAC) = 1.0 IS RETURNED.
C V(RDFCMN) (IN)  MINIMUM VALUE FOR V(RADFAC) IN TERMS OF THE INPUT
C             VALUE OF V(DSTNRM) -- SUGGESTED VALUE = 0.1.
C V(RDFCMX) (IN)  MAXIMUM VALUE FOR V(RADFAC) -- SUGGESTED VALUE = 4.0.
C  V(RELDX) (IN) SCALED RELATIVE CHANGE IN X CAUSED BY STEP, COMPUTED
C             (E.G.) BY FUNCTION   RLDST  AS
C                 MAX (D(I)*ABS(X(I)-X0(I)), 1 .LE. I .LE. P) /
C                    MAX (D(I)*(ABS(X(I))+ABS(X0(I))), 1 .LE. I .LE. P).
C V(RFCTOL) (IN)  RELATIVE FUNCTION CONVERGENCE TOLERANCE.  IF THE
C             ACTUAL FUNCTION REDUCTION IS AT MOST TWICE WHAT WAS PRE-
C             DICTED AND  V(NREDUC) .LE. V(RFCTOL)*ABS(V(F0)),  THEN
C             A7SST RETURNS WITH IV(IRC) = 8 OR 9.
C  V(SCTOL) (IN)  SINGULAR CONVERGENCE TOLERANCE -- SEE V(LMAXS).
C V(STPPAR) (IN)  MARQUARDT PARAMETER -- 0 MEANS FULL NEWTON STEP.
C V(TUNER1) (IN)  TUNING CONSTANT USED TO DECIDE IF THE FUNCTION
C             REDUCTION WAS MUCH LESS THAN EXPECTED.  SUGGESTED
C             VALUE = 0.1.
C V(TUNER2) (IN)  TUNING CONSTANT USED TO DECIDE IF THE FUNCTION
C             REDUCTION WAS LARGE ENOUGH TO ACCEPT STEP.  SUGGESTED
C             VALUE = 10**-4.
C V(TUNER3) (IN)  TUNING CONSTANT USED TO DECIDE IF THE RADIUS
C             SHOULD BE INCREASED.  SUGGESTED VALUE = 0.75.
C  V(XCTOL) (IN)  X-CONVERGENCE CRITERION.  IF STEP IS A NEWTON STEP
C             (V(STPPAR) = 0) HAVING V(RELDX) .LE. V(XCTOL) AND GIVING
C             AT MOST TWICE THE PREDICTED FUNCTION DECREASE, THEN
C             A7SST RETURNS IV(IRC) = 7 OR 9.
C  V(XFTOL) (IN)  FALSE CONVERGENCE TOLERANCE.  IF STEP GAVE NO OR ONLY
C             A SMALL FUNCTION DECREASE AND V(RELDX) .LE. V(XFTOL),
C             THEN A7SST RETURNS WITH IV(IRC) = 12.
C
C-------------------------------  NOTES  -------------------------------
C
C  ***  APPLICATION AND USAGE RESTRICTIONS  ***
C
C        THIS ROUTINE IS CALLED AS PART OF THE NL2SOL (NONLINEAR
C     LEAST-SQUARES) PACKAGE.  IT MAY BE USED IN ANY UNCONSTRAINED
C     MINIMIZATION SOLVER THAT USES DOGLEG, GOLDFELD-QUANDT-TROTTER,
C     OR LEVENBERG-MARQUARDT STEPS.
C
C  ***  ALGORITHM NOTES  ***
C
C        SEE (1) FOR FURTHER DISCUSSION OF THE ASSESSING AND MODEL
C     SWITCHING STRATEGIES.  WHILE NL2SOL CONSIDERS ONLY TWO MODELS,
C     A7SST IS DESIGNED TO HANDLE ANY NUMBER OF MODELS.
C
C  ***  USAGE NOTES  ***
C
C        ON THE FIRST CALL OF AN ITERATION, ONLY THE I/O VARIABLES
C     STEP, X, IV(IRC), IV(MODEL), V(F), V(DSTNRM), V(GTSTEP), AND
C     V(PREDUC) NEED HAVE BEEN INITIALIZED.  BETWEEN CALLS, NO I/O
C     VALUES EXCEPT STEP, X, IV(MODEL), V(F) AND THE STOPPING TOLER-
C     ANCES SHOULD BE CHANGED.
C        AFTER A RETURN FOR CONVERGENCE OR FALSE CONVERGENCE, ONE CAN
C     CHANGE THE STOPPING TOLERANCES AND CALL A7SST AGAIN, IN WHICH
C     CASE THE STOPPING TESTS WILL BE REPEATED.
C
C  ***  REFERENCES  ***
C
C     (1) DENNIS, J.E., JR., GAY, D.M., AND WELSCH, R.E. (1981),
C        AN ADAPTIVE NONLINEAR LEAST-SQUARES ALGORITHM,
C        ACM TRANS. MATH. SOFTWARE, VOL. 7, NO. 3.
C
C     (2) POWELL, M.J.D. (1970)  A FORTRAN SUBROUTINE FOR SOLVING
C        SYSTEMS OF NONLINEAR ALGEBRAIC EQUATIONS, IN NUMERICAL
C        METHODS FOR NONLINEAR ALGEBRAIC EQUATIONS, EDITED BY
C        P. RABINOWITZ, GORDON AND BREACH, LONDON.
C
C  ***  HISTORY  ***
C
C        JOHN DENNIS DESIGNED MUCH OF THIS ROUTINE, STARTING WITH
C     IDEAS IN (2). ROY WELSCH SUGGESTED THE MODEL SWITCHING STRATEGY.
C        DAVID GAY AND STEPHEN PETERS CAST THIS SUBROUTINE INTO A MORE
C     PORTABLE FORM (WINTER 1977), AND DAVID GAY CAST IT INTO ITS
C     PRESENT FORM (FALL 1978), WITH MINOR CHANGES TO THE SINGULAR
C     CONVERGENCE TEST IN MAY, 1984 (TO DEAL WITH FULL NEWTON STEPS).
C
C  ***  GENERAL  ***
C
C     THIS SUBROUTINE WAS WRITTEN IN CONNECTION WITH RESEARCH
C     SUPPORTED BY THE NATIONAL SCIENCE FOUNDATION UNDER GRANTS
C     MCS-7600324, DCR75-10143, 76-14311DSS, MCS76-11989, AND
C     MCS-7906671.
C
C------------------------  EXTERNAL QUANTITIES  ------------------------
C
C  ***  NO EXTERNAL FUNCTIONS AND SUBROUTINES  ***
C
C--------------------------  LOCAL VARIABLES  --------------------------
C
      LOGICAL GOODX
      INTEGER I, NFC
      REAL EMAX, EMAXS, GTS, RFAC1, XMAX
      REAL HALF, ONE, ONEP2, TWO, ZERO
C
C  ***  SUBSCRIPTS FOR IV AND V  ***
C
      INTEGER AFCTOL, DECFAC, DSTNRM, DSTSAV, DST0, F, FDIF, FLSTGD, F0,
     1        GTSLST, GTSTEP, INCFAC, IRC, LMAXS, MLSTGD, MODEL, NFCALL,
     2        NFGCAL, NREDUC, PLSTGD, PREDUC, RADFAC, RADINC, RDFCMN,
     3        RDFCMX, RELDX, RESTOR, RFCTOL, SCTOL, STAGE, STGLIM,
     4        STPPAR, SWITCH, TOOBIG, TUNER1, TUNER2, TUNER3, XCTOL,
     5        XFTOL, XIRC
C
C  ***  DATA INITIALIZATIONS  ***
C
C/6
C     DATA HALF/0.5E+0/, ONE/1.E+0/, ONEP2/1.2E+0/, TWO/2.E+0/,
C    1     ZERO/0.E+0/
C/7
      PARAMETER (HALF=0.5E+0, ONE=1.E+0, ONEP2=1.2E+0, TWO=2.E+0,
     1           ZERO=0.E+0)
C/
C
C/6
C     DATA IRC/29/, MLSTGD/32/, MODEL/5/, NFCALL/6/, NFGCAL/7/,
C    1     RADINC/8/, RESTOR/9/, STAGE/10/, STGLIM/11/, SWITCH/12/,
C    2     TOOBIG/2/, XIRC/13/
C/7
      PARAMETER (IRC=29, MLSTGD=32, MODEL=5, NFCALL=6, NFGCAL=7,
     1           RADINC=8, RESTOR=9, STAGE=10, STGLIM=11, SWITCH=12,
     2           TOOBIG=2, XIRC=13)
C/
C/6
C     DATA AFCTOL/31/, DECFAC/22/, DSTNRM/2/, DST0/3/, DSTSAV/18/,
C    1     F/10/, FDIF/11/, FLSTGD/12/, F0/13/, GTSLST/14/, GTSTEP/4/,
C    2     INCFAC/23/, LMAXS/36/, NREDUC/6/, PLSTGD/15/, PREDUC/7/,
C    3     RADFAC/16/, RDFCMN/24/, RDFCMX/25/, RELDX/17/, RFCTOL/32/,
C    4     SCTOL/37/, STPPAR/5/, TUNER1/26/, TUNER2/27/, TUNER3/28/,
C    5     XCTOL/33/, XFTOL/34/
C/7
      PARAMETER (AFCTOL=31, DECFAC=22, DSTNRM=2, DST0=3, DSTSAV=18,
     1           F=10, FDIF=11, FLSTGD=12, F0=13, GTSLST=14, GTSTEP=4,
     2           INCFAC=23, LMAXS=36, NREDUC=6, PLSTGD=15, PREDUC=7,
     3           RADFAC=16, RDFCMN=24, RDFCMX=25, RELDX=17, RFCTOL=32,
     4           SCTOL=37, STPPAR=5, TUNER1=26, TUNER2=27, TUNER3=28,
     5           XCTOL=33, XFTOL=34)
C/
C
C+++++++++++++++++++++++++++++++  BODY  ++++++++++++++++++++++++++++++++
C
      NFC = IV(NFCALL)
      IV(SWITCH) = 0
      IV(RESTOR) = 0
      RFAC1 = ONE
      GOODX = .TRUE.
      I = IV(IRC)
      IF (I .GE. 1 .AND. I .LE. 12)
     1             GO TO (20,30,10,10,40,280,220,220,220,220,220,170), I
         IV(IRC) = 13
         GO TO 999
C
C  ***  INITIALIZE FOR NEW ITERATION  ***
C
 10   IV(STAGE) = 1
      IV(RADINC) = 0
      V(FLSTGD) = V(F0)
      IF (IV(TOOBIG) .EQ. 0) GO TO 110
         IV(STAGE) = -1
         IV(XIRC) = I
         GO TO 60
C
C  ***  STEP WAS RECOMPUTED WITH NEW MODEL OR SMALLER RADIUS  ***
C  ***  FIRST DECIDE WHICH  ***
C
 20   IF (IV(MODEL) .NE. IV(MLSTGD)) GO TO 30
C        ***  OLD MODEL RETAINED, SMALLER RADIUS TRIED  ***
C        ***  DO NOT CONSIDER ANY MORE NEW MODELS THIS ITERATION  ***
         IV(STAGE) = IV(STGLIM)
         IV(RADINC) = -1
         GO TO 110
C
C  ***  A NEW MODEL IS BEING TRIED.  DECIDE WHETHER TO KEEP IT.  ***
C
 30   IV(STAGE) = IV(STAGE) + 1
C
C     ***  NOW WE ADD THE POSSIBILITY THAT STEP WAS RECOMPUTED WITH  ***
C     ***  THE SAME MODEL, PERHAPS BECAUSE OF AN OVERSIZED STEP.     ***
C
 40   IF (IV(STAGE) .GT. 0) GO TO 50
C
C        ***  STEP WAS RECOMPUTED BECAUSE IT WAS TOO BIG.  ***
C
         IF (IV(TOOBIG) .NE. 0) GO TO 60
C
C        ***  RESTORE IV(STAGE) AND PICK UP WHERE WE LEFT OFF.  ***
C
         IV(STAGE) = -IV(STAGE)
         I = IV(XIRC)
         GO TO (20, 30, 110, 110, 70), I
C
 50   IF (IV(TOOBIG) .EQ. 0) GO TO 70
C
C  ***  HANDLE OVERSIZE STEP  ***
C
      IV(TOOBIG) = 0
      IF (IV(RADINC) .GT. 0) GO TO 80
         IV(STAGE) = -IV(STAGE)
         IV(XIRC) = IV(IRC)
C
 60      IV(TOOBIG) = 0
         V(RADFAC) = V(DECFAC)
         IV(RADINC) = IV(RADINC) - 1
         IV(IRC) = 5
         IV(RESTOR) = 1
         V(F) = V(FLSTGD)
         GO TO 999
C
 70   IF (V(F) .LT. V(FLSTGD)) GO TO 110
C
C     *** THE NEW STEP IS A LOSER.  RESTORE OLD MODEL.  ***
C
      IF (IV(MODEL) .EQ. IV(MLSTGD)) GO TO 80
         IV(MODEL) = IV(MLSTGD)
         IV(SWITCH) = 1
C
C     ***  RESTORE STEP, ETC. ONLY IF A PREVIOUS STEP DECREASED V(F).
C
 80   IF (V(FLSTGD) .GE. V(F0)) GO TO 110
         IF (IV(STAGE) .LT. IV(STGLIM)) THEN
            GOODX = .FALSE.
         ELSE IF (NFC .LT. IV(NFGCAL) + IV(STGLIM) + 2) THEN
            GOODX = .FALSE.
         ELSE IF (IV(SWITCH) .NE. 0) THEN
            GOODX = .FALSE.
            ENDIF
         IV(RESTOR) = 3
         V(F) = V(FLSTGD)
         V(PREDUC) = V(PLSTGD)
         V(GTSTEP) = V(GTSLST)
         IF (IV(SWITCH) .EQ. 0) RFAC1 = V(DSTNRM) / V(DSTSAV)
         V(DSTNRM) = V(DSTSAV)
         IF (GOODX) THEN
C
C     ***  ACCEPT PREVIOUS SLIGHTLY REDUCING STEP ***
C
            V(FDIF) = V(F0) - V(F)
            IV(IRC) = 4
            V(RADFAC) = RFAC1
            GO TO 999
            ENDIF
         NFC = IV(NFGCAL)
C
 110  V(FDIF) = V(F0) - V(F)
      IF (V(FDIF) .GT. V(TUNER2) * V(PREDUC)) GO TO 140
      IF (IV(RADINC) .GT. 0) GO TO 140
C
C        ***  NO (OR ONLY A TRIVIAL) FUNCTION DECREASE
C        ***  -- SO TRY NEW MODEL OR SMALLER RADIUS
C
         IF (V(F) .LT. V(F0)) GO TO 120
              IV(MLSTGD) = IV(MODEL)
              V(FLSTGD) = V(F)
              V(F) = V(F0)
              IV(RESTOR) = 1
              GO TO 130
 120     IV(NFGCAL) = NFC
 130     IV(IRC) = 1
         IF (IV(STAGE) .LT. IV(STGLIM)) GO TO 160
              IV(IRC) = 5
              IV(RADINC) = IV(RADINC) - 1
              GO TO 160
C
C  ***  NONTRIVIAL FUNCTION DECREASE ACHIEVED  ***
C
 140  IV(NFGCAL) = NFC
      RFAC1 = ONE
      V(DSTSAV) = V(DSTNRM)
      IF (V(FDIF) .GT. V(PREDUC)*V(TUNER1)) GO TO 190
C
C  ***  DECREASE WAS MUCH LESS THAN PREDICTED -- EITHER CHANGE MODELS
C  ***  OR ACCEPT STEP WITH DECREASED RADIUS.
C
      IF (IV(STAGE) .GE. IV(STGLIM)) GO TO 150
C        ***  CONSIDER SWITCHING MODELS  ***
         IV(IRC) = 2
         GO TO 160
C
C     ***  ACCEPT STEP WITH DECREASED RADIUS  ***
C
 150  IV(IRC) = 4
C
C  ***  SET V(RADFAC) TO FLETCHER*S DECREASE FACTOR  ***
C
 160  IV(XIRC) = IV(IRC)
      EMAX = V(GTSTEP) + V(FDIF)
      V(RADFAC) = HALF * RFAC1
      IF (EMAX .LT. V(GTSTEP)) V(RADFAC) = RFAC1 * AMAX1(V(RDFCMN),
     1                                           HALF * V(GTSTEP)/EMAX)
C
C  ***  DO FALSE CONVERGENCE TEST  ***
C
 170  IF (V(RELDX) .LE. V(XFTOL)) GO TO 180
         IV(IRC) = IV(XIRC)
         IF (V(F) .LT. V(F0)) GO TO 200
              GO TO 230
C
 180  IV(IRC) = 12
      GO TO 240
C
C  ***  HANDLE GOOD FUNCTION DECREASE  ***
C
 190  IF (V(FDIF) .LT. (-V(TUNER3) * V(GTSTEP))) GO TO 210
C
C     ***  INCREASING RADIUS LOOKS WORTHWHILE.  SEE IF WE JUST
C     ***  RECOMPUTED STEP WITH A DECREASED RADIUS OR RESTORED STEP
C     ***  AFTER RECOMPUTING IT WITH A LARGER RADIUS.
C
      IF (IV(RADINC) .LT. 0) GO TO 210
      IF (IV(RESTOR) .EQ. 1) GO TO 210
      IF (IV(RESTOR) .EQ. 3) GO TO 210
C
C        ***  WE DID NOT.  TRY A LONGER STEP UNLESS THIS WAS A NEWTON
C        ***  STEP.
C
         V(RADFAC) = V(RDFCMX)
         GTS = V(GTSTEP)
         IF (V(FDIF) .LT. (HALF/V(RADFAC) - ONE) * GTS)
     1            V(RADFAC) = AMAX1(V(INCFAC), HALF*GTS/(GTS + V(FDIF)))
         IV(IRC) = 4
         IF (V(STPPAR) .EQ. ZERO) GO TO 230
         IF (V(DST0) .GE. ZERO .AND. (V(DST0) .LT. TWO*V(DSTNRM)
     1             .OR. V(NREDUC) .LT. ONEP2*V(FDIF)))  GO TO 230
C             ***  STEP WAS NOT A NEWTON STEP.  RECOMPUTE IT WITH
C             ***  A LARGER RADIUS.
              IV(IRC) = 5
              IV(RADINC) = IV(RADINC) + 1
C
C  ***  SAVE VALUES CORRESPONDING TO GOOD STEP  ***
C
 200  V(FLSTGD) = V(F)
      IV(MLSTGD) = IV(MODEL)
      IF (IV(RESTOR) .EQ. 0) IV(RESTOR) = 2
      V(DSTSAV) = V(DSTNRM)
      IV(NFGCAL) = NFC
      V(PLSTGD) = V(PREDUC)
      V(GTSLST) = V(GTSTEP)
      GO TO 230
C
C  ***  ACCEPT STEP WITH RADIUS UNCHANGED  ***
C
 210  V(RADFAC) = ONE
      IV(IRC) = 3
      GO TO 230
C
C  ***  COME HERE FOR A RESTART AFTER CONVERGENCE  ***
C
 220  IV(IRC) = IV(XIRC)
      IF (V(DSTSAV) .GE. ZERO) GO TO 240
         IV(IRC) = 12
         GO TO 240
C
C  ***  PERFORM CONVERGENCE TESTS  ***
C
 230  IV(XIRC) = IV(IRC)
 240  IF (IV(RESTOR) .EQ. 1 .AND. V(FLSTGD) .LT. V(F0)) IV(RESTOR) = 3
      IF ( ABS(V(F)) .LT. V(AFCTOL)) IV(IRC) = 10
      IF (HALF * V(FDIF) .GT. V(PREDUC)) GO TO 999
      EMAX = V(RFCTOL) *  ABS(V(F0))
      EMAXS = V(SCTOL) *  ABS(V(F0))
      IF (V(PREDUC) .LE. EMAXS .AND. (V(DSTNRM) .GT. V(LMAXS) .OR.
     1     V(STPPAR) .EQ. ZERO)) IV(IRC) = 11
      IF (V(DST0) .LT. ZERO) GO TO 250
      I = 0
      IF ((V(NREDUC) .GT. ZERO .AND. V(NREDUC) .LE. EMAX) .OR.
     1    (V(NREDUC) .EQ. ZERO. AND. V(PREDUC) .EQ. ZERO))  I = 2
      IF (V(STPPAR) .EQ. ZERO .AND. V(RELDX) .LE. V(XCTOL)
     1                        .AND. GOODX)                  I = I + 1
      IF (I .GT. 0) IV(IRC) = I + 6
C
C  ***  CONSIDER RECOMPUTING STEP OF LENGTH V(LMAXS) FOR SINGULAR
C  ***  CONVERGENCE TEST.
C
 250  IF (IV(IRC) .GT. 5 .AND. IV(IRC) .NE. 12) GO TO 999
      IF (V(STPPAR) .EQ. ZERO) GO TO 999
      IF (V(DSTNRM) .GT. V(LMAXS)) GO TO 260
         IF (V(PREDUC) .GE. EMAXS) GO TO 999
              IF (V(DST0) .LE. ZERO) GO TO 270
                   IF (HALF * V(DST0) .LE. V(LMAXS)) GO TO 999
                        GO TO 270
 260  IF (HALF * V(DSTNRM) .LE. V(LMAXS)) GO TO 999
      XMAX = V(LMAXS) / V(DSTNRM)
      IF (XMAX * (TWO - XMAX) * V(PREDUC) .GE. EMAXS) GO TO 999
 270  IF (V(NREDUC) .LT. ZERO) GO TO 290
C
C  ***  RECOMPUTE V(PREDUC) FOR USE IN SINGULAR CONVERGENCE TEST  ***
C
      V(GTSLST) = V(GTSTEP)
      V(DSTSAV) = V(DSTNRM)
      IF (IV(IRC) .EQ. 12) V(DSTSAV) = -V(DSTSAV)
      V(PLSTGD) = V(PREDUC)
      I = IV(RESTOR)
      IV(RESTOR) = 2
      IF (I .EQ. 3) IV(RESTOR) = 0
      IV(IRC) = 6
      GO TO 999
C
C  ***  PERFORM SINGULAR CONVERGENCE TEST WITH RECOMPUTED V(PREDUC)  ***
C
 280  V(GTSTEP) = V(GTSLST)
      V(DSTNRM) =  ABS(V(DSTSAV))
      IV(IRC) = IV(XIRC)
      IF (V(DSTSAV) .LE. ZERO) IV(IRC) = 12
      V(NREDUC) = -V(PREDUC)
      V(PREDUC) = V(PLSTGD)
      IV(RESTOR) = 3
 290  IF (-V(NREDUC) .LE. V(SCTOL) *  ABS(V(F0))) IV(IRC) = 11
C
 999  RETURN
C
C  ***  LAST LINE OF A7SST FOLLOWS  ***
      END

 

 

netphilou31
New Contributor II
293 Views

I add the same thought when remembering a subroutine I used to solve a linear system of equations and also developped in the early days of Fortran (may be around 1975).

The routine had two matrixes as arguments (let say A and B) , the first (A) contains the coefficients for system to be solved, and the second (B) was used to get the inverse of the original matrix. There was a comment in the header of the subroutine telling that if you don't want to keep matrix A you can call the routine with (A,A,...) instead of (A,B,...). I don't know if at that time the standard already had anti-aliasing rules, but I know a lot of source code in which this "feature" is used and that worked perfectly for years (the code was written in such a way this worked). Of course it does not mean that this was correct but as the standard changes the way these things are handled, I think that a warning message or an error message should be generated at compile time.

JohnNichols
Valued Contributor II
280 Views

There are standards, which may have errors or omissions. 

There are compilers that may be missing features, have real error from mistakes in coding or misunderstanding the standard. 

There are programs where people never read the standards, what is a standard we are the Intel Fortran Forum experts,  and their last lesson in Fortran was in 1978 on a daemon computer with punch cards. 

There is the program which is a mixture of all of this and the program appears to run. The program may be based on a flawed or partial theory, beam theory cannot predict temperature dependence of beam vibration, a very real problem I assure you, beam theory does not predict well the effect of massive compression, spar theory does, but it is not used in 3D - well not completely and then only in one program I am aware of at the moment.  

So allowing for all of these potential errors is it a wonder that the forum gets daily reports of bugs.  

The interesting thing is that people make it work anyway.  

Of course I am not going to call routine(A,A)  it is a recipe for a disaster.  

But that is a personal opinion.  

 

 

 

 

mecej4
Black Belt
264 Views

Here is a test program (two source files zipped and attached) that demonstrates the aliasing issue that I  mentioned earlier in this thread. The bug occurred only with one compiler (not Intel) when it had optimization turned on.

The call with the aliased arguments can be seen in these lines:

 

 

   dl7 = dl7svx(p,L,x,y)
   print 20,'Two Array version', dl7, x, y

   dl7 = dl7svx(p,L,x,x)                ! Aliased arguments!
   print 10,'Aliased version  ', dl7, x

 

 

With most compilers, the output from the program will be:

 

 

 Two Array version  1.179982E+01 x   -9.8586E+00  5.6358E+00  3.0323E+00 -1.0429E+00
                                 y    6.0251E+00  4.0673E+00  7.4929E+00  2.8194E+00
 Aliased version    1.179982E+01 x   -9.8586E+00  5.6358E+00  3.0323E+00 -1.0429E+00

 

 

One compiler (the excellent Lahey-Fujitsu 7.1 for Windows), gives the surprising output:

 

 

 Two Array version  1.179982E+01 x   -9.8586E+00  5.6358E+00  3.0323E+00 -1.0429E+00
                                 y    6.0251E+00  4.0673E+00  7.4929E+00  2.8194E+00
 Aliased version    0.000000E+00 x    0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00

 

 

 The details of how the zeros entered on the last line are related to how the LF optimizer handles the following lines of code:

 

 

!
!  ***  NORMALIZE Y AND SET X = (L**T)*Y  ***
!
         t = one/dv2nrm(p,y)
         ji = 1
         do i = 1, p        ! Aliasing bug when x and y coincide
            yi = t*y(i)     ! optimizer interchanges order of this
            x(i) = zero     ! and this, causing yi, x() and y() to be zero
            call dv2axy (i, x, yi, l(ji), x)
            ji = ji + i
         end do
         dl7svx = dv2nrm(p,x)
      else
         dl7svx = zero
      endif

 

Please note that this bug is attributable to the programmer(s), not the compiler. That the results from a faulty program are "correct, as expected" is no reason to praise the compiler used, nor is the output of "incorrect, not as expected" a reason to blame the compiler. Nor is it reasonable to expect the compiler to guess that the user is going to call the subroutine with x and y identical and issue a warning. If it did, we would see thousands of warnings and ask for a way to make it shut up.

 

mecej4
Black Belt
259 Views

Forum coordinators, please note this slightly annoying feature of the forum software: each time that I edit a post to make corrections, extra blank lines get added between text and code sections, and those extra lines become visible only after pushing the "Post Reply" button.

Reply