I am currently working on an existing code (generating a dll) to make it thread safe. I have removed all existing common and global variables by moving them into an external structure which is systematically passed as an argument from external calls. During this work, I found a strange behavior when switching from the default local storage option to /Qauto for the Release configuration (with default optimizations settings). The code does not produce the expected results whereas /Qauto in Debug configuration works. In my tries to solve the problem, I found that switching from /fpsource to /fp:fast=1 or /fp:fast=2 in the floating-point settings seems to solve the issue even if there are small numerical differences in the results with the original code.
My compiler version is Intel® Fortran Compiler Classic 2021.4.0 [IA-32]
My original settings are (who don't produce a working dll):
/nologo /O2 /fpp /DWIN32 /DFRENCH /reentrancy:threaded /extend-source:132 /warn:interfaces /Qauto /assume:byterecl /fpe:1 /fp:source /fpconstant /Qfp-speculation=strict /module:"Win32\Release\\" /object:"Win32\Release\\" /Fd"Win32\Release\vc160.pdb" /libs:static /threads /Qmkl:sequential /assume:noold_maxminloc /c
and the new ones:
/nologo /O2 /fpp /DWIN32 /DFRENCH /reentrancy:threaded /extend-source:132 /warn:interfaces /Qauto /assume:byterecl /fpe:1 /fpconstant /Qfp-speculation=strict /module:"Win32\Release\\" /object:"Win32\Release\\" /Fd"Win32\Release\vc160.pdb" /libs:static /threads /Qmkl:sequential /assume:noold_maxminloc /c
Removing the optimization (i.e. switrching fro /O2 flag to /Od) seems also to work with the original floating point settings.
Even if I think that I probably need also to add the /assume:recursion and/or /Qopenmp flags to make it fully thread safe, any idea about this problem will be welcomed.
P.S: I think I used the /fp:source /fpconstant /Qfp-speculation=strict after reading a paper written by Steve Lionel about the accuracy and reproductivility of results in a floating point environment.
Just to give some complementary information, it seems that the compiler behaves differently in the following situation:
let's assume a calculation routine with the following signature:
subroutine BGGP(T,P,Z,I,GGPI,HGPI,SGPI,DELHVI,VAR,NMAX,IERR) integer(4), intent(in) :: I, NMAX real(8), intent(in) :: T, P, Z(*) real(8), intent(inout) :: GGPI(*), HGPI(*), SGPI(*), DELHVI(*) real(8), intent(inout) :: VAR(NMAX,*) integer(4), intent(out) :: IERR
and let's define GGPI(), HGPI(), SGPI() and DELHVI() as subarrays of array VAR() (respectively VAR(:,42), VAR(:,12), etc...)
Normally, calling the procedure with:
Should produce the same results in debug and release mode (I am using the default calling convention).
However, using for example both SGPI(I) and VAR(I,17) in the called subroutine to refer the same quantity, seems to generate different results between Debug and Release configurations with /Qauto and /Qopenmp along with /fp:fast=2 settings.
For example (even if I agree it is a strange way of coding)
SGPI(I) = IdealGasEntropy(T,P,Z,...) VARI(I,17) = VARI(I,17) + S0F(I)
However, calling the routine with:
seems to produce the same answer.
Unfortunately, until now I was obliged to use the first form of call, because the VAR() array is declared with the pointer attribute:
real(8), pointer :: VAR(:,:)
I will probably need to change the way I declare the VAR array in order to use an assumed-size array along with a fortran pointer declaration like:
real(8) :: VAR(NMAX,*) pointer(ptrVAR, VAR)
to be able to pass VAR directly as VAR(1,12).
Any idea or explanation of this issue?
Your call to the subroutine may be violating the anti-aliasing rules, since the argument arrays GGPI, HGPI, SGPI, DELHVI are sections of the array VAR, which is also an argument, and all these arrays have INTENT(IN OUT). Without seeing how these arrays are accessed and used in the subroutine, it is not possible to diagnose the problem, but here is what can go wrong.
Suppose that you change some elements of array GGPI in the subroutine. That will cause the corresponding elements of VAR to change, but "behind the compiler's back". If, within the subroutine, those elements of VAR are used in expressions, the code may use stale values instead. Or, if you intended the original values of VAR to be used, the code may use the new values, instead, against your intentions.
Similarly, changes to elements of VAR may cause corresponding portions of GGPI, etc., to change.
Such errors are more likely to bite when higher levels of optimization are used. They are also unpredictable as to their behavior patterns, which is why the language rules prohibit "changing the values in an argument variable except through its own name".
It appears that the problem you have is you have an alias between two dummy arguments (one output)...
.AND. the optimized code is (in an offending section of code) vectorizing a section of code with vector dependencies introduced by the alias.
IOW one of your aliased dummy outputs is/was written before the other aliased output was read.
You will have to identify the offending loop(s) and make use of the following:
DO ... ! the offending loop
Hi mece and Jim,
Thanks for your replies.
To give you the full information, arrays HGPI, SGPI and DELHVI are used to store intermediate results, but it is sometimes more convenient to pass them as array sections of work array VAR to make the code more readable and to add the full array in case we need to access other sections or to pass them in internal calls. The results of the calculations are stored in the array GGPI. However, since I use the default calling convention, arguments are passed by reference, so to take an example VAR(1,17) should have the same memory address as SGPI(1), consequently, updating one of them should update VAR as well. This is the way our code works, and I can tell you that we use this artefact in a lot of places without any problem until I tried to add the /Qauto and /Qopenmp compiler switches. Does it mean that because of /Qauto and/or /Qopenmp, VAR(1:N,17) represents a copy of a section of the array VAR and that they don't share the same memory locations?
@jimdempseyatthecove , I have no loop inside the subroutine (the array index is passed as argument, the loop is outside) and as shown in my code extract, the element SGPI(I) is calculated just before VARI(I,17) is updated (they should have the same memory address).
From your description, it appears that your code has been written in such a way that it violates the anti-aliasing rules. Please read Dr. Fortran's article on this topic. With such code, which used to work correctly with older compilers, perhaps because they may have done little or no optimization. if fixing the code to make it standard-conforming is not feasible, try using the compiler option /assume:dummy_aliases with those source files that are in violation.
I do find that several of your statements in this thread to be incorrect, such as "VAR(1,17) should have the same memory address as SGPI(1), consequently, updating one of them should update VAR as well. This is the way our code works". To put it bluntly, that code is broken, as it is based on an incorrect assumption.
Please see the short code example (Program ALIAS1) in the article that I referenced. Note as well the excerpt of the rules that the code violates, which Steve has included as a quotation.
After reading Dr. Fortran's post, I understand that depending on compiler optimizations, VARI(I,17) may have been updated before SGPI(I) was calculated, and I concede that I should not have written the code like this (in fact it was simply an oversight when updating an existing section of code). However, what I don't understand is your comment about memory addresses, and why SGPI(1) and VAR(1,17) should not have the same memory locations.
As an example, we have a lot of code (which I am currently updating), where:
VAR(1,3) is passed as actual argument and where the corresponding dummy argument is Z(), then, in the called routine Z() is updated with another array (let say X which can also be a section of VAR), with Z(1:N) = X(1:N) and then it is not Z() which is passed as internal call but VAR(1,3) and this works perfectly!
Even, if this way of coding looks very strange (which I fully agree), the code was written 30 years ago and has worked perfectly over the years with several different compilers (Watcom fortran, Digital visual fortran, COMPAQ and Intel) and OS (unix, dos/windows, vms).
I would understand that passing VAR(1:N,3) would create an internal copy of the section of the VAR array behind the scenes and that it may not share the same memory location as VAR, however I would have expected that at least on exit of the called subroutine, the corresponding section of VAR was updated in some way.
Even, if this way of coding looks very strange (which I fully agree), the code was written 30 years ago and has worked perfectly over the years with several different compilers (Watcom fortran, Digital visual fortran, COMPAQ and Intel) and OS (unix, dos/windows, vms).
This means nothing. As compilers get smarter over the years, or simply rearrange the way things are done, errors you previously didn't notice become visible.
I agree Steve, but when you don't know what the compiler does behind the scenes or does differently from one version to the next or even behave differently depending on some switches, it's sometimes difficult (not to say impossible) to understand why a source code you worked with for years suddenly and which contains thousands of lines of code stops working or behave differently. In my case it was the use of the /Qauto switch along with the optimization settings that made the problems visible whereas with the "Default local storage" option nothing appeared.
On a modern computer memory is not a single entity, as it was in, say, the 8088. We have RAM, multiple levels of cache memory, and registers. Speed is promoted by keeping variables in the fastest level of memory as long as possible.
Consider a single scalar subroutine argument that gets updated in a subroutine. Copies of that variable may exist in many places, and those copies may have different values. The compiler will manage the updating and refreshing in such a way that the final results are the same as if all the copies were maintained coherent.
If, however, you do CALL SUB(x, x) to subroutine SUB(X, Y), and you did not tell the compiler that X and Y could be one and the same, and you allowed the compiler to optimize, the code in the subroutine may malfunction. Please work through Steve's PROGRAM ALIAS1 to convince yourself on this matter.
Things can get even worse if X has INTENT(IN), and Y has (IN OUT) in the subroutine specification, and you call with (X,X). What would you expect the subroutine to do?
Things get worse when some of these variables are accessed and updated by more than one thread.
I have come across a few instances where software library routines that have been used for decades by thousands of people suddenly started malfunctioning because of such aliasing errors.
An instructive series of posts from 1999 on this topic is worth reading.
I agree with your statement about memory management but I though the compiler was clever enough to avoid these situations. Luckily, I was starting to rewrite these strange parts of code when I faced this issue.
Anyway, it has been a tremendous luck that these codes worked for so long (even in Release configuration with default optimizations) and started to produce unexpected results after adding the /Qauto compiler switch.
Programs with errors (assumptions) are like walking on thin ice: sometimes you make it, sometimes you don't.
If the compiler is instructed to assume and or all dummy arguments are aliased, and it was attempting to make sense out of the aliasing, the code generated would be non-optimal (as well as possibly incorrect). So, for this file, use /Od and walk on thin ice.
An alternative is to
a) write defensive code in the affected procedure with branch a) handling non-aliased arguments and branch b) handling aliased arguments
b) on CALL to affected procedure, use an array constructor to construct a copy of the offending aliased argument, with the requirement that the called procedure typed it with intent(in). If it is intent(inout) or intent(out) the array constructor method will not work.
Jim's admonishment (regarding continuing to use programs with errors) became highlighted for me yesterday and today.
I was working on an application that performs separable nonlinear regression using routines from the JPL and PORT collections, using about 50 subroutines from the first and about 70 from the second collection. If I compiled with optimization turned on, some test runs failed to give correct results. Fortunately, the incorrect results were so bad that they made me suspect aliasing. I also wondered if there were variables that were used before initialization, and variables that needed to be saved but were not, and that notion took some time to chase down and dismiss. It was also annoying that changing compiler options or using a different compiler sometimes yielded an EXE that produced the correct results.
It took quite a bit of effort to track down the error, since these old Fortran 77 codes are not easy to work with in the debugger. There were two instances of aliasing errors that I was able to pinpoint as the source of the errors, and I describe one of them here. Most of these codes are about forty years old.
The subprogram in question has the following argument list:
DOUBLE PRECISION FUNCTION DL7SVX(P, L, X, Y) INTEGER P DOUBLE PRECISION L(P*(P+1)/2), X(P), Y(P)
Comments in the source code state: "THE CALLER MAY PASS THE SAME VECTOR FOR X AND Y (NONSTANDARD FORTRAN USAGE), IN WHICH CASE X OVERWRITES Y."
There are four instances in other library subroutines in which this function is invoked, and in every one of them, the last two arguments are identical to each other:
T:\port\ORIG>grep -in "=.*dl7svx" *.f dg7lit.f:736: T = T / DL7SVX(P, V(LMAT1), V(STEP1), V(STEP1)) drnsg.f:257: T = DL7SVX(K, V(AR1), C, C) drnsg.f:425: T = T / DL7SVX(PP, V(K), V(TEMP1), V(TEMP1)) drnsgb.f:228: T = DL7SVX(K, V(AR1), C, C)
Note that this is not just a function, since it returns the leading singular value of a matrix through the function name and the associated right and left singular vectors through the array arguments X and Y.
Randomly pulling a coded routine from Port shows a lot of potential issues in using this code, even allowing for the fact that it has been downloaded by many users. The count is high.
1. No implicit none, if you turn that one on with old code there is a high probability you will get errors even though the code works in spite of the errors.
2. Line 202 anything written for an NSF grant is likely to be have been written by graduate students. Graduate students are trying to finish their degree not be perfectly correct. I have had 124 grad students I am aware of their skill level. Supervisors look for gross errors in logic.
3. Line 270 -- ok so even God does not know where all these go and testing this will be a nightmare.
4. There is a famous paper by a grad student. Absolutely brilliant Chinese engineer, he/she looked at the development of BEM from 1970 to about 1996 when the paper was written, his/her supervisor basically invented BEM. He/she went through and demolished the research work of some very highly regarded academic BEM modelers. It was ruthless. I laughed, they would not have, my guess is the supervisor was issuing a course correction. I would not want to have been that grad student trying to become published. I should check on his/her career. Peer review ultimately works, but it can take decades and sometimes mistakes slip through, I mean we did not lose a satellite on mars because of an error in g, actually ..
This is a personal opinion, others may disagree, I understand I have often stepped into the unknown, luckily the likes of @mecej4 , @jimdempseyatthecove and @Steve_Lionel have stopped my worst errors for that I am very grateful.
SUBROUTINE A7SST(IV, LIV, LV, V) C C *** ASSESS CANDIDATE STEP (***SOL VERSION 2.3) *** C INTEGER LIV, LV INTEGER IV(LIV) REAL V(LV) C C *** PURPOSE *** C C THIS SUBROUTINE IS CALLED BY AN UNCONSTRAINED MINIMIZATION C ROUTINE TO ASSESS THE NEXT CANDIDATE STEP. IT MAY RECOMMEND ONE C OF SEVERAL COURSES OF ACTION, SUCH AS ACCEPTING THE STEP, RECOM- C PUTING IT USING THE SAME OR A NEW QUADRATIC MODEL, OR HALTING DUE C TO CONVERGENCE OR FALSE CONVERGENCE. SEE THE RETURN CODE LISTING C BELOW. C C-------------------------- PARAMETER USAGE -------------------------- C C IV (I/O) INTEGER PARAMETER AND SCRATCH VECTOR -- SEE DESCRIPTION C BELOW OF IV VALUES REFERENCED. C LIV (IN) LENGTH OF IV ARRAY. C LV (IN) LENGTH OF V ARRAY. C V (I/O) REAL PARAMETER AND SCRATCH VECTOR -- SEE DESCRIPTION C BELOW OF V VALUES REFERENCED. C C *** IV VALUES REFERENCED *** C C IV(IRC) (I/O) ON INPUT FOR THE FIRST STEP TRIED IN A NEW ITERATION, C IV(IRC) SHOULD BE SET TO 3 OR 4 (THE VALUE TO WHICH IT IS C SET WHEN STEP IS DEFINITELY TO BE ACCEPTED). ON INPUT C AFTER STEP HAS BEEN RECOMPUTED, IV(IRC) SHOULD BE C UNCHANGED SINCE THE PREVIOUS RETURN OF A7SST. C ON OUTPUT, IV(IRC) IS A RETURN CODE HAVING ONE OF THE C FOLLOWING VALUES... C 1 = SWITCH MODELS OR TRY SMALLER STEP. C 2 = SWITCH MODELS OR ACCEPT STEP. C 3 = ACCEPT STEP AND DETERMINE V(RADFAC) BY GRADIENT C TESTS. C 4 = ACCEPT STEP, V(RADFAC) HAS BEEN DETERMINED. C 5 = RECOMPUTE STEP (USING THE SAME MODEL). C 6 = RECOMPUTE STEP WITH RADIUS = V(LMAXS) BUT DO NOT C EVALUATE THE OBJECTIVE FUNCTION. C 7 = X-CONVERGENCE (SEE V(XCTOL)). C 8 = RELATIVE FUNCTION CONVERGENCE (SEE V(RFCTOL)). C 9 = BOTH X- AND RELATIVE FUNCTION CONVERGENCE. C 10 = ABSOLUTE FUNCTION CONVERGENCE (SEE V(AFCTOL)). C 11 = SINGULAR CONVERGENCE (SEE V(LMAXS)). C 12 = FALSE CONVERGENCE (SEE V(XFTOL)). C 13 = IV(IRC) WAS OUT OF RANGE ON INPUT. C RETURN CODE I HAS PRECEDENCE OVER I+1 FOR I = 9, 10, 11. C IV(MLSTGD) (I/O) SAVED VALUE OF IV(MODEL). C IV(MODEL) (I/O) ON INPUT, IV(MODEL) SHOULD BE AN INTEGER IDENTIFYING C THE CURRENT QUADRATIC MODEL OF THE OBJECTIVE FUNCTION. C IF A PREVIOUS STEP YIELDED A BETTER FUNCTION REDUCTION, C THEN IV(MODEL) WILL BE SET TO IV(MLSTGD) ON OUTPUT. C IV(NFCALL) (IN) INVOCATION COUNT FOR THE OBJECTIVE FUNCTION. C IV(NFGCAL) (I/O) VALUE OF IV(NFCALL) AT STEP THAT GAVE THE BIGGEST C FUNCTION REDUCTION THIS ITERATION. IV(NFGCAL) REMAINS C UNCHANGED UNTIL A FUNCTION REDUCTION IS OBTAINED. C IV(RADINC) (I/O) THE NUMBER OF RADIUS INCREASES (OR MINUS THE NUMBER C OF DECREASES) SO FAR THIS ITERATION. C IV(RESTOR) (OUT) SET TO 1 IF V(F) HAS BEEN RESTORED AND X SHOULD BE C RESTORED TO ITS INITIAL VALUE, TO 2 IF X SHOULD BE SAVED, C TO 3 IF X SHOULD BE RESTORED FROM THE SAVED VALUE, AND TO C 0 OTHERWISE. C IV(STAGE) (I/O) COUNT OF THE NUMBER OF MODELS TRIED SO FAR IN THE C CURRENT ITERATION. C IV(STGLIM) (IN) MAXIMUM NUMBER OF MODELS TO CONSIDER. C IV(SWITCH) (OUT) SET TO 0 UNLESS A NEW MODEL IS BEING TRIED AND IT C GIVES A SMALLER FUNCTION VALUE THAN THE PREVIOUS MODEL, C IN WHICH CASE A7SST SETS IV(SWITCH) = 1. C IV(TOOBIG) (I/O) IS NONZERO ON INPUT IF STEP WAS TOO BIG (E.G., IF C IT WOULD CAUSE OVERFLOW). IT IS SET TO 0 ON RETURN. C IV(XIRC) (I/O) VALUE THAT IV(IRC) WOULD HAVE IN THE ABSENCE OF C CONVERGENCE, FALSE CONVERGENCE, AND OVERSIZED STEPS. C C *** V VALUES REFERENCED *** C C V(AFCTOL) (IN) ABSOLUTE FUNCTION CONVERGENCE TOLERANCE. IF THE C ABSOLUTE VALUE OF THE CURRENT FUNCTION VALUE V(F) IS LESS C THAN V(AFCTOL) AND A7SST DOES NOT RETURN WITH C IV(IRC) = 11, THEN A7SST RETURNS WITH IV(IRC) = 10. C V(DECFAC) (IN) FACTOR BY WHICH TO DECREASE RADIUS WHEN IV(TOOBIG) IS C NONZERO. C V(DSTNRM) (IN) THE 2-NORM OF D*STEP. C V(DSTSAV) (I/O) VALUE OF V(DSTNRM) ON SAVED STEP. C V(DST0) (IN) THE 2-NORM OF D TIMES THE NEWTON STEP (WHEN DEFINED, C I.E., FOR V(NREDUC) .GE. 0). C V(F) (I/O) ON BOTH INPUT AND OUTPUT, V(F) IS THE OBJECTIVE FUNC- C TION VALUE AT X. IF X IS RESTORED TO A PREVIOUS VALUE, C THEN V(F) IS RESTORED TO THE CORRESPONDING VALUE. C V(FDIF) (OUT) THE FUNCTION REDUCTION V(F0) - V(F) (FOR THE OUTPUT C VALUE OF V(F) IF AN EARLIER STEP GAVE A BIGGER FUNCTION C DECREASE, AND FOR THE INPUT VALUE OF V(F) OTHERWISE). C V(FLSTGD) (I/O) SAVED VALUE OF V(F). C V(F0) (IN) OBJECTIVE FUNCTION VALUE AT START OF ITERATION. C V(GTSLST) (I/O) VALUE OF V(GTSTEP) ON SAVED STEP. C V(GTSTEP) (IN) INNER PRODUCT BETWEEN STEP AND GRADIENT. C V(INCFAC) (IN) MINIMUM FACTOR BY WHICH TO INCREASE RADIUS. C V(LMAXS) (IN) MAXIMUM REASONABLE STEP SIZE (AND INITIAL STEP BOUND). C IF THE ACTUAL FUNCTION DECREASE IS NO MORE THAN TWICE C WHAT WAS PREDICTED, IF A RETURN WITH IV(IRC) = 7, 8, OR 9 C DOES NOT OCCUR, IF V(DSTNRM) .GT. V(LMAXS) OR THE CURRENT C STEP IS A NEWTON STEP, AND IF C V(PREDUC) .LE. V(SCTOL) * ABS(V(F0)), THEN A7SST RETURNS C WITH IV(IRC) = 11. IF SO DOING APPEARS WORTHWHILE, THEN C A7SST REPEATS THIS TEST (DISALLOWING A FULL NEWTON STEP) C WITH V(PREDUC) COMPUTED FOR A STEP OF LENGTH V(LMAXS) C (BY A RETURN WITH IV(IRC) = 6). C V(NREDUC) (I/O) FUNCTION REDUCTION PREDICTED BY QUADRATIC MODEL FOR C NEWTON STEP. IF A7SST IS CALLED WITH IV(IRC) = 6, I.E., C IF V(PREDUC) HAS BEEN COMPUTED WITH RADIUS = V(LMAXS) FOR C USE IN THE SINGULAR CONVERGENCE TEST, THEN V(NREDUC) IS C SET TO -V(PREDUC) BEFORE THE LATTER IS RESTORED. C V(PLSTGD) (I/O) VALUE OF V(PREDUC) ON SAVED STEP. C V(PREDUC) (I/O) FUNCTION REDUCTION PREDICTED BY QUADRATIC MODEL FOR C CURRENT STEP. C V(RADFAC) (OUT) FACTOR TO BE USED IN DETERMINING THE NEW RADIUS, C WHICH SHOULD BE V(RADFAC)*DST, WHERE DST IS EITHER THE C OUTPUT VALUE OF V(DSTNRM) OR THE 2-NORM OF C DIAG(NEWD)*STEP FOR THE OUTPUT VALUE OF STEP AND THE C UPDATED VERSION, NEWD, OF THE SCALE VECTOR D. FOR C IV(IRC) = 3, V(RADFAC) = 1.0 IS RETURNED. C V(RDFCMN) (IN) MINIMUM VALUE FOR V(RADFAC) IN TERMS OF THE INPUT C VALUE OF V(DSTNRM) -- SUGGESTED VALUE = 0.1. C V(RDFCMX) (IN) MAXIMUM VALUE FOR V(RADFAC) -- SUGGESTED VALUE = 4.0. C V(RELDX) (IN) SCALED RELATIVE CHANGE IN X CAUSED BY STEP, COMPUTED C (E.G.) BY FUNCTION RLDST AS C MAX (D(I)*ABS(X(I)-X0(I)), 1 .LE. I .LE. P) / C MAX (D(I)*(ABS(X(I))+ABS(X0(I))), 1 .LE. I .LE. P). C V(RFCTOL) (IN) RELATIVE FUNCTION CONVERGENCE TOLERANCE. IF THE C ACTUAL FUNCTION REDUCTION IS AT MOST TWICE WHAT WAS PRE- C DICTED AND V(NREDUC) .LE. V(RFCTOL)*ABS(V(F0)), THEN C A7SST RETURNS WITH IV(IRC) = 8 OR 9. C V(SCTOL) (IN) SINGULAR CONVERGENCE TOLERANCE -- SEE V(LMAXS). C V(STPPAR) (IN) MARQUARDT PARAMETER -- 0 MEANS FULL NEWTON STEP. C V(TUNER1) (IN) TUNING CONSTANT USED TO DECIDE IF THE FUNCTION C REDUCTION WAS MUCH LESS THAN EXPECTED. SUGGESTED C VALUE = 0.1. C V(TUNER2) (IN) TUNING CONSTANT USED TO DECIDE IF THE FUNCTION C REDUCTION WAS LARGE ENOUGH TO ACCEPT STEP. SUGGESTED C VALUE = 10**-4. C V(TUNER3) (IN) TUNING CONSTANT USED TO DECIDE IF THE RADIUS C SHOULD BE INCREASED. SUGGESTED VALUE = 0.75. C V(XCTOL) (IN) X-CONVERGENCE CRITERION. IF STEP IS A NEWTON STEP C (V(STPPAR) = 0) HAVING V(RELDX) .LE. V(XCTOL) AND GIVING C AT MOST TWICE THE PREDICTED FUNCTION DECREASE, THEN C A7SST RETURNS IV(IRC) = 7 OR 9. C V(XFTOL) (IN) FALSE CONVERGENCE TOLERANCE. IF STEP GAVE NO OR ONLY C A SMALL FUNCTION DECREASE AND V(RELDX) .LE. V(XFTOL), C THEN A7SST RETURNS WITH IV(IRC) = 12. C C------------------------------- NOTES ------------------------------- C C *** APPLICATION AND USAGE RESTRICTIONS *** C C THIS ROUTINE IS CALLED AS PART OF THE NL2SOL (NONLINEAR C LEAST-SQUARES) PACKAGE. IT MAY BE USED IN ANY UNCONSTRAINED C MINIMIZATION SOLVER THAT USES DOGLEG, GOLDFELD-QUANDT-TROTTER, C OR LEVENBERG-MARQUARDT STEPS. C C *** ALGORITHM NOTES *** C C SEE (1) FOR FURTHER DISCUSSION OF THE ASSESSING AND MODEL C SWITCHING STRATEGIES. WHILE NL2SOL CONSIDERS ONLY TWO MODELS, C A7SST IS DESIGNED TO HANDLE ANY NUMBER OF MODELS. C C *** USAGE NOTES *** C C ON THE FIRST CALL OF AN ITERATION, ONLY THE I/O VARIABLES C STEP, X, IV(IRC), IV(MODEL), V(F), V(DSTNRM), V(GTSTEP), AND C V(PREDUC) NEED HAVE BEEN INITIALIZED. BETWEEN CALLS, NO I/O C VALUES EXCEPT STEP, X, IV(MODEL), V(F) AND THE STOPPING TOLER- C ANCES SHOULD BE CHANGED. C AFTER A RETURN FOR CONVERGENCE OR FALSE CONVERGENCE, ONE CAN C CHANGE THE STOPPING TOLERANCES AND CALL A7SST AGAIN, IN WHICH C CASE THE STOPPING TESTS WILL BE REPEATED. C C *** REFERENCES *** C C (1) DENNIS, J.E., JR., GAY, D.M., AND WELSCH, R.E. (1981), C AN ADAPTIVE NONLINEAR LEAST-SQUARES ALGORITHM, C ACM TRANS. MATH. SOFTWARE, VOL. 7, NO. 3. C C (2) POWELL, M.J.D. (1970) A FORTRAN SUBROUTINE FOR SOLVING C SYSTEMS OF NONLINEAR ALGEBRAIC EQUATIONS, IN NUMERICAL C METHODS FOR NONLINEAR ALGEBRAIC EQUATIONS, EDITED BY C P. RABINOWITZ, GORDON AND BREACH, LONDON. C C *** HISTORY *** C C JOHN DENNIS DESIGNED MUCH OF THIS ROUTINE, STARTING WITH C IDEAS IN (2). ROY WELSCH SUGGESTED THE MODEL SWITCHING STRATEGY. C DAVID GAY AND STEPHEN PETERS CAST THIS SUBROUTINE INTO A MORE C PORTABLE FORM (WINTER 1977), AND DAVID GAY CAST IT INTO ITS C PRESENT FORM (FALL 1978), WITH MINOR CHANGES TO THE SINGULAR C CONVERGENCE TEST IN MAY, 1984 (TO DEAL WITH FULL NEWTON STEPS). C C *** GENERAL *** C C THIS SUBROUTINE WAS WRITTEN IN CONNECTION WITH RESEARCH C SUPPORTED BY THE NATIONAL SCIENCE FOUNDATION UNDER GRANTS C MCS-7600324, DCR75-10143, 76-14311DSS, MCS76-11989, AND C MCS-7906671. C C------------------------ EXTERNAL QUANTITIES ------------------------ C C *** NO EXTERNAL FUNCTIONS AND SUBROUTINES *** C C-------------------------- LOCAL VARIABLES -------------------------- C LOGICAL GOODX INTEGER I, NFC REAL EMAX, EMAXS, GTS, RFAC1, XMAX REAL HALF, ONE, ONEP2, TWO, ZERO C C *** SUBSCRIPTS FOR IV AND V *** C INTEGER AFCTOL, DECFAC, DSTNRM, DSTSAV, DST0, F, FDIF, FLSTGD, F0, 1 GTSLST, GTSTEP, INCFAC, IRC, LMAXS, MLSTGD, MODEL, NFCALL, 2 NFGCAL, NREDUC, PLSTGD, PREDUC, RADFAC, RADINC, RDFCMN, 3 RDFCMX, RELDX, RESTOR, RFCTOL, SCTOL, STAGE, STGLIM, 4 STPPAR, SWITCH, TOOBIG, TUNER1, TUNER2, TUNER3, XCTOL, 5 XFTOL, XIRC C C *** DATA INITIALIZATIONS *** C C/6 C DATA HALF/0.5E+0/, ONE/1.E+0/, ONEP2/1.2E+0/, TWO/2.E+0/, C 1 ZERO/0.E+0/ C/7 PARAMETER (HALF=0.5E+0, ONE=1.E+0, ONEP2=1.2E+0, TWO=2.E+0, 1 ZERO=0.E+0) C/ C C/6 C DATA IRC/29/, MLSTGD/32/, MODEL/5/, NFCALL/6/, NFGCAL/7/, C 1 RADINC/8/, RESTOR/9/, STAGE/10/, STGLIM/11/, SWITCH/12/, C 2 TOOBIG/2/, XIRC/13/ C/7 PARAMETER (IRC=29, MLSTGD=32, MODEL=5, NFCALL=6, NFGCAL=7, 1 RADINC=8, RESTOR=9, STAGE=10, STGLIM=11, SWITCH=12, 2 TOOBIG=2, XIRC=13) C/ C/6 C DATA AFCTOL/31/, DECFAC/22/, DSTNRM/2/, DST0/3/, DSTSAV/18/, C 1 F/10/, FDIF/11/, FLSTGD/12/, F0/13/, GTSLST/14/, GTSTEP/4/, C 2 INCFAC/23/, LMAXS/36/, NREDUC/6/, PLSTGD/15/, PREDUC/7/, C 3 RADFAC/16/, RDFCMN/24/, RDFCMX/25/, RELDX/17/, RFCTOL/32/, C 4 SCTOL/37/, STPPAR/5/, TUNER1/26/, TUNER2/27/, TUNER3/28/, C 5 XCTOL/33/, XFTOL/34/ C/7 PARAMETER (AFCTOL=31, DECFAC=22, DSTNRM=2, DST0=3, DSTSAV=18, 1 F=10, FDIF=11, FLSTGD=12, F0=13, GTSLST=14, GTSTEP=4, 2 INCFAC=23, LMAXS=36, NREDUC=6, PLSTGD=15, PREDUC=7, 3 RADFAC=16, RDFCMN=24, RDFCMX=25, RELDX=17, RFCTOL=32, 4 SCTOL=37, STPPAR=5, TUNER1=26, TUNER2=27, TUNER3=28, 5 XCTOL=33, XFTOL=34) C/ C C+++++++++++++++++++++++++++++++ BODY ++++++++++++++++++++++++++++++++ C NFC = IV(NFCALL) IV(SWITCH) = 0 IV(RESTOR) = 0 RFAC1 = ONE GOODX = .TRUE. I = IV(IRC) IF (I .GE. 1 .AND. I .LE. 12) 1 GO TO (20,30,10,10,40,280,220,220,220,220,220,170), I IV(IRC) = 13 GO TO 999 C C *** INITIALIZE FOR NEW ITERATION *** C 10 IV(STAGE) = 1 IV(RADINC) = 0 V(FLSTGD) = V(F0) IF (IV(TOOBIG) .EQ. 0) GO TO 110 IV(STAGE) = -1 IV(XIRC) = I GO TO 60 C C *** STEP WAS RECOMPUTED WITH NEW MODEL OR SMALLER RADIUS *** C *** FIRST DECIDE WHICH *** C 20 IF (IV(MODEL) .NE. IV(MLSTGD)) GO TO 30 C *** OLD MODEL RETAINED, SMALLER RADIUS TRIED *** C *** DO NOT CONSIDER ANY MORE NEW MODELS THIS ITERATION *** IV(STAGE) = IV(STGLIM) IV(RADINC) = -1 GO TO 110 C C *** A NEW MODEL IS BEING TRIED. DECIDE WHETHER TO KEEP IT. *** C 30 IV(STAGE) = IV(STAGE) + 1 C C *** NOW WE ADD THE POSSIBILITY THAT STEP WAS RECOMPUTED WITH *** C *** THE SAME MODEL, PERHAPS BECAUSE OF AN OVERSIZED STEP. *** C 40 IF (IV(STAGE) .GT. 0) GO TO 50 C C *** STEP WAS RECOMPUTED BECAUSE IT WAS TOO BIG. *** C IF (IV(TOOBIG) .NE. 0) GO TO 60 C C *** RESTORE IV(STAGE) AND PICK UP WHERE WE LEFT OFF. *** C IV(STAGE) = -IV(STAGE) I = IV(XIRC) GO TO (20, 30, 110, 110, 70), I C 50 IF (IV(TOOBIG) .EQ. 0) GO TO 70 C C *** HANDLE OVERSIZE STEP *** C IV(TOOBIG) = 0 IF (IV(RADINC) .GT. 0) GO TO 80 IV(STAGE) = -IV(STAGE) IV(XIRC) = IV(IRC) C 60 IV(TOOBIG) = 0 V(RADFAC) = V(DECFAC) IV(RADINC) = IV(RADINC) - 1 IV(IRC) = 5 IV(RESTOR) = 1 V(F) = V(FLSTGD) GO TO 999 C 70 IF (V(F) .LT. V(FLSTGD)) GO TO 110 C C *** THE NEW STEP IS A LOSER. RESTORE OLD MODEL. *** C IF (IV(MODEL) .EQ. IV(MLSTGD)) GO TO 80 IV(MODEL) = IV(MLSTGD) IV(SWITCH) = 1 C C *** RESTORE STEP, ETC. ONLY IF A PREVIOUS STEP DECREASED V(F). C 80 IF (V(FLSTGD) .GE. V(F0)) GO TO 110 IF (IV(STAGE) .LT. IV(STGLIM)) THEN GOODX = .FALSE. ELSE IF (NFC .LT. IV(NFGCAL) + IV(STGLIM) + 2) THEN GOODX = .FALSE. ELSE IF (IV(SWITCH) .NE. 0) THEN GOODX = .FALSE. ENDIF IV(RESTOR) = 3 V(F) = V(FLSTGD) V(PREDUC) = V(PLSTGD) V(GTSTEP) = V(GTSLST) IF (IV(SWITCH) .EQ. 0) RFAC1 = V(DSTNRM) / V(DSTSAV) V(DSTNRM) = V(DSTSAV) IF (GOODX) THEN C C *** ACCEPT PREVIOUS SLIGHTLY REDUCING STEP *** C V(FDIF) = V(F0) - V(F) IV(IRC) = 4 V(RADFAC) = RFAC1 GO TO 999 ENDIF NFC = IV(NFGCAL) C 110 V(FDIF) = V(F0) - V(F) IF (V(FDIF) .GT. V(TUNER2) * V(PREDUC)) GO TO 140 IF (IV(RADINC) .GT. 0) GO TO 140 C C *** NO (OR ONLY A TRIVIAL) FUNCTION DECREASE C *** -- SO TRY NEW MODEL OR SMALLER RADIUS C IF (V(F) .LT. V(F0)) GO TO 120 IV(MLSTGD) = IV(MODEL) V(FLSTGD) = V(F) V(F) = V(F0) IV(RESTOR) = 1 GO TO 130 120 IV(NFGCAL) = NFC 130 IV(IRC) = 1 IF (IV(STAGE) .LT. IV(STGLIM)) GO TO 160 IV(IRC) = 5 IV(RADINC) = IV(RADINC) - 1 GO TO 160 C C *** NONTRIVIAL FUNCTION DECREASE ACHIEVED *** C 140 IV(NFGCAL) = NFC RFAC1 = ONE V(DSTSAV) = V(DSTNRM) IF (V(FDIF) .GT. V(PREDUC)*V(TUNER1)) GO TO 190 C C *** DECREASE WAS MUCH LESS THAN PREDICTED -- EITHER CHANGE MODELS C *** OR ACCEPT STEP WITH DECREASED RADIUS. C IF (IV(STAGE) .GE. IV(STGLIM)) GO TO 150 C *** CONSIDER SWITCHING MODELS *** IV(IRC) = 2 GO TO 160 C C *** ACCEPT STEP WITH DECREASED RADIUS *** C 150 IV(IRC) = 4 C C *** SET V(RADFAC) TO FLETCHER*S DECREASE FACTOR *** C 160 IV(XIRC) = IV(IRC) EMAX = V(GTSTEP) + V(FDIF) V(RADFAC) = HALF * RFAC1 IF (EMAX .LT. V(GTSTEP)) V(RADFAC) = RFAC1 * AMAX1(V(RDFCMN), 1 HALF * V(GTSTEP)/EMAX) C C *** DO FALSE CONVERGENCE TEST *** C 170 IF (V(RELDX) .LE. V(XFTOL)) GO TO 180 IV(IRC) = IV(XIRC) IF (V(F) .LT. V(F0)) GO TO 200 GO TO 230 C 180 IV(IRC) = 12 GO TO 240 C C *** HANDLE GOOD FUNCTION DECREASE *** C 190 IF (V(FDIF) .LT. (-V(TUNER3) * V(GTSTEP))) GO TO 210 C C *** INCREASING RADIUS LOOKS WORTHWHILE. SEE IF WE JUST C *** RECOMPUTED STEP WITH A DECREASED RADIUS OR RESTORED STEP C *** AFTER RECOMPUTING IT WITH A LARGER RADIUS. C IF (IV(RADINC) .LT. 0) GO TO 210 IF (IV(RESTOR) .EQ. 1) GO TO 210 IF (IV(RESTOR) .EQ. 3) GO TO 210 C C *** WE DID NOT. TRY A LONGER STEP UNLESS THIS WAS A NEWTON C *** STEP. C V(RADFAC) = V(RDFCMX) GTS = V(GTSTEP) IF (V(FDIF) .LT. (HALF/V(RADFAC) - ONE) * GTS) 1 V(RADFAC) = AMAX1(V(INCFAC), HALF*GTS/(GTS + V(FDIF))) IV(IRC) = 4 IF (V(STPPAR) .EQ. ZERO) GO TO 230 IF (V(DST0) .GE. ZERO .AND. (V(DST0) .LT. TWO*V(DSTNRM) 1 .OR. V(NREDUC) .LT. ONEP2*V(FDIF))) GO TO 230 C *** STEP WAS NOT A NEWTON STEP. RECOMPUTE IT WITH C *** A LARGER RADIUS. IV(IRC) = 5 IV(RADINC) = IV(RADINC) + 1 C C *** SAVE VALUES CORRESPONDING TO GOOD STEP *** C 200 V(FLSTGD) = V(F) IV(MLSTGD) = IV(MODEL) IF (IV(RESTOR) .EQ. 0) IV(RESTOR) = 2 V(DSTSAV) = V(DSTNRM) IV(NFGCAL) = NFC V(PLSTGD) = V(PREDUC) V(GTSLST) = V(GTSTEP) GO TO 230 C C *** ACCEPT STEP WITH RADIUS UNCHANGED *** C 210 V(RADFAC) = ONE IV(IRC) = 3 GO TO 230 C C *** COME HERE FOR A RESTART AFTER CONVERGENCE *** C 220 IV(IRC) = IV(XIRC) IF (V(DSTSAV) .GE. ZERO) GO TO 240 IV(IRC) = 12 GO TO 240 C C *** PERFORM CONVERGENCE TESTS *** C 230 IV(XIRC) = IV(IRC) 240 IF (IV(RESTOR) .EQ. 1 .AND. V(FLSTGD) .LT. V(F0)) IV(RESTOR) = 3 IF ( ABS(V(F)) .LT. V(AFCTOL)) IV(IRC) = 10 IF (HALF * V(FDIF) .GT. V(PREDUC)) GO TO 999 EMAX = V(RFCTOL) * ABS(V(F0)) EMAXS = V(SCTOL) * ABS(V(F0)) IF (V(PREDUC) .LE. EMAXS .AND. (V(DSTNRM) .GT. V(LMAXS) .OR. 1 V(STPPAR) .EQ. ZERO)) IV(IRC) = 11 IF (V(DST0) .LT. ZERO) GO TO 250 I = 0 IF ((V(NREDUC) .GT. ZERO .AND. V(NREDUC) .LE. EMAX) .OR. 1 (V(NREDUC) .EQ. ZERO. AND. V(PREDUC) .EQ. ZERO)) I = 2 IF (V(STPPAR) .EQ. ZERO .AND. V(RELDX) .LE. V(XCTOL) 1 .AND. GOODX) I = I + 1 IF (I .GT. 0) IV(IRC) = I + 6 C C *** CONSIDER RECOMPUTING STEP OF LENGTH V(LMAXS) FOR SINGULAR C *** CONVERGENCE TEST. C 250 IF (IV(IRC) .GT. 5 .AND. IV(IRC) .NE. 12) GO TO 999 IF (V(STPPAR) .EQ. ZERO) GO TO 999 IF (V(DSTNRM) .GT. V(LMAXS)) GO TO 260 IF (V(PREDUC) .GE. EMAXS) GO TO 999 IF (V(DST0) .LE. ZERO) GO TO 270 IF (HALF * V(DST0) .LE. V(LMAXS)) GO TO 999 GO TO 270 260 IF (HALF * V(DSTNRM) .LE. V(LMAXS)) GO TO 999 XMAX = V(LMAXS) / V(DSTNRM) IF (XMAX * (TWO - XMAX) * V(PREDUC) .GE. EMAXS) GO TO 999 270 IF (V(NREDUC) .LT. ZERO) GO TO 290 C C *** RECOMPUTE V(PREDUC) FOR USE IN SINGULAR CONVERGENCE TEST *** C V(GTSLST) = V(GTSTEP) V(DSTSAV) = V(DSTNRM) IF (IV(IRC) .EQ. 12) V(DSTSAV) = -V(DSTSAV) V(PLSTGD) = V(PREDUC) I = IV(RESTOR) IV(RESTOR) = 2 IF (I .EQ. 3) IV(RESTOR) = 0 IV(IRC) = 6 GO TO 999 C C *** PERFORM SINGULAR CONVERGENCE TEST WITH RECOMPUTED V(PREDUC) *** C 280 V(GTSTEP) = V(GTSLST) V(DSTNRM) = ABS(V(DSTSAV)) IV(IRC) = IV(XIRC) IF (V(DSTSAV) .LE. ZERO) IV(IRC) = 12 V(NREDUC) = -V(PREDUC) V(PREDUC) = V(PLSTGD) IV(RESTOR) = 3 290 IF (-V(NREDUC) .LE. V(SCTOL) * ABS(V(F0))) IV(IRC) = 11 C 999 RETURN C C *** LAST LINE OF A7SST FOLLOWS *** END
I add the same thought when remembering a subroutine I used to solve a linear system of equations and also developped in the early days of Fortran (may be around 1975).
The routine had two matrixes as arguments (let say A and B) , the first (A) contains the coefficients for system to be solved, and the second (B) was used to get the inverse of the original matrix. There was a comment in the header of the subroutine telling that if you don't want to keep matrix A you can call the routine with (A,A,...) instead of (A,B,...). I don't know if at that time the standard already had anti-aliasing rules, but I know a lot of source code in which this "feature" is used and that worked perfectly for years (the code was written in such a way this worked). Of course it does not mean that this was correct but as the standard changes the way these things are handled, I think that a warning message or an error message should be generated at compile time.
There are standards, which may have errors or omissions.
There are compilers that may be missing features, have real error from mistakes in coding or misunderstanding the standard.
There are programs where people never read the standards, what is a standard we are the Intel Fortran Forum experts, and their last lesson in Fortran was in 1978 on a daemon computer with punch cards.
There is the program which is a mixture of all of this and the program appears to run. The program may be based on a flawed or partial theory, beam theory cannot predict temperature dependence of beam vibration, a very real problem I assure you, beam theory does not predict well the effect of massive compression, spar theory does, but it is not used in 3D - well not completely and then only in one program I am aware of at the moment.
So allowing for all of these potential errors is it a wonder that the forum gets daily reports of bugs.
The interesting thing is that people make it work anyway.
Of course I am not going to call routine(A,A) it is a recipe for a disaster.
But that is a personal opinion.
Here is a test program (two source files zipped and attached) that demonstrates the aliasing issue that I mentioned earlier in this thread. The bug occurred only with one compiler (not Intel) when it had optimization turned on.
The call with the aliased arguments can be seen in these lines:
dl7 = dl7svx(p,L,x,y) print 20,'Two Array version', dl7, x, y dl7 = dl7svx(p,L,x,x) ! Aliased arguments! print 10,'Aliased version ', dl7, x
With most compilers, the output from the program will be:
Two Array version 1.179982E+01 x -9.8586E+00 5.6358E+00 3.0323E+00 -1.0429E+00 y 6.0251E+00 4.0673E+00 7.4929E+00 2.8194E+00 Aliased version 1.179982E+01 x -9.8586E+00 5.6358E+00 3.0323E+00 -1.0429E+00
One compiler (the excellent Lahey-Fujitsu 7.1 for Windows), gives the surprising output:
Two Array version 1.179982E+01 x -9.8586E+00 5.6358E+00 3.0323E+00 -1.0429E+00 y 6.0251E+00 4.0673E+00 7.4929E+00 2.8194E+00 Aliased version 0.000000E+00 x 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
The details of how the zeros entered on the last line are related to how the LF optimizer handles the following lines of code:
! ! *** NORMALIZE Y AND SET X = (L**T)*Y *** ! t = one/dv2nrm(p,y) ji = 1 do i = 1, p ! Aliasing bug when x and y coincide yi = t*y(i) ! optimizer interchanges order of this x(i) = zero ! and this, causing yi, x() and y() to be zero call dv2axy (i, x, yi, l(ji), x) ji = ji + i end do dl7svx = dv2nrm(p,x) else dl7svx = zero endif
Please note that this bug is attributable to the programmer(s), not the compiler. That the results from a faulty program are "correct, as expected" is no reason to praise the compiler used, nor is the output of "incorrect, not as expected" a reason to blame the compiler. Nor is it reasonable to expect the compiler to guess that the user is going to call the subroutine with x and y identical and issue a warning. If it did, we would see thousands of warnings and ask for a way to make it shut up.
Forum coordinators, please note this slightly annoying feature of the forum software: each time that I edit a post to make corrections, extra blank lines get added between text and code sections, and those extra lines become visible only after pushing the "Post Reply" button.