Confusing compiler bug with 2021.7.0

OP1 · ‎10-05-2022

I have been scratching my head lately with behaviors I observe with ifort 2021.7.0 when using combinations of Enable F2018 Semantics (/standard-semantics), the SAVE and RECURSIVE attributes, and whether a procedure is contained in a module or a program. The behavior of the small reproducer attached does not make any sense to me.

First: the code is built with /standard-semantics, x64, Release options.

In this code, S1 (contained in the module M) and S2 (contained in the program P) are identical. They are recursive subroutines. Due to the use of /standard-semantics, the compiler does not complain about the absence of the RECURSIVE attribute for the subroutines (it's not necessary anymore with fortran 2018).

However, the behavior of S1 and S2 is different!

Calling S1 only (by commenting the call to S2) leads to a buggy behavior.
Calling S1 and S2 leads to a buggy behavior.
Calling S2 only (by commenting the call to S1) works as intended.
Adding the RECURSIVE attribute to S1 does not change any of the above.
Adding the SAVE attribute for the declaration of J in S1 leads to correct behavior of S1.

I thought that the SAVE attribute was implied for recursive procedures? And even so, why is it that it is necessary to have it for S1 and not for S2?

It's very possible that there is a subtle (or arcane) section of the standard that explains the difference of behavior of S1 and S2... still, it looks a bit fishy. I would certain appreciate if you can enlighten me (or confirm the bug)!

MODULE M
IMPLICIT NONE (TYPE, EXTERNAL)
CONTAINS

    ! Adding RECURSIVE to the SUBROUTINE statement does not change
    ! the buggy behavior of the program.
    SUBROUTINE S1(I)
    IMPLICIT NONE (TYPE, EXTERNAL)
    INTEGER :: I

    ! Adding SAVE to the following declaration solves all the issues.
    INTEGER :: J
    WRITE(*, *) I
    IF (I == 1) THEN
        J = 2
    ELSE
        J = J + I
    END IF
    IF (J > 5) STOP
    CALL S1(J)
    END SUBROUTINE S1

END MODULE M


PROGRAM P
USE M
IMPLICIT NONE (TYPE, EXTERNAL)

! For the next two lines...
!     - Commenting the call to S1 leads to proper execution of S2.
!     - Commenting the call to S2 leads to buggy behavior of S1.
!     - Having both calls uncommented lead to buggy behavior of S1 and S2.
CALL S1(1)
CALL S2(1)

CONTAINS

    SUBROUTINE S2(I)
    IMPLICIT NONE (TYPE, EXTERNAL)
    INTEGER, INTENT(IN) :: I
    INTEGER :: J
    WRITE(*, *) I
    IF (I == 1) THEN
        J = 2
    ELSE
        J = J + I
    END IF
    IF (J > 5) STOP
    CALL S2(J)
    END SUBROUTINE S2

END PROGRAM P

mecej4 · ‎10-05-2022

You wrote "I thought that the SAVE attribute was implied for recursive procedures?" Where did you get that impression from?

When the subroutine argument I has a value other than 1, the local variable J is undefined in S1 and S2. Even if you make J a saved local variable, it would still be undefined if, in the first call to the subroutine, the argument I has a value other than 1. Thus, the subroutine is written in such a way that a particular sequence of arguments has to be passed to it in order for the subroutine to work as intended. This feature makes bugs more likely.

Steve_Lionel · ‎10-05-2022

SAVE is not implied unless the local variable is initialized in its declaration. This didn't change with F2018, but a lot of long-time Fotran programmers believed that all variables were SAVEd by default because that's how older compilers implemented things.

The different behavior you see is solely due to references to uninitialized memory. Depending on whether the stack location of I got reused between calls, it may work or may not.

OP1 · ‎10-05-2022

Thanks for the answers.

@mecej4 - The Intel documentation is a bit confusing (and I was probably not reading it correctly). For SAVE, It says "Variables are implicitly given the SAVE attribute depending on whether a program unit is compiled for recursion." and then proceeds with a section where "[...] The following variables are not saved by default: Scalar variables that are local to a recursive procedure and are not initialized".

@Steve_Lionel - So, if I understand well, you are saying that the different behavior of S1 and S2 is just due to pure luck (due to a code that is buggy in itself since S1 and S2 do not feature the SAVE attribute for J - when they should).

The bottom line... it's probably best to avoid SAVE altogether (or to rely on any of the implied save mechanisms from the standard) and use module variables instead, ha ha.

Steve_Lionel · ‎10-05-2022

You shouldn't use SAVE unless you understand what it does and its effect on procedures called recursively.

Arjen_Markus · ‎10-05-2022

I usually use the SAVE attribute to a variable declaration to document that it is expected to retain its value between calls, even if it is given an initial value. Not the SAVE statement. (Related: I have had many a discussion with colleagues used to the C-type initialisation, where it is simply shorthand for combining the declaration and the initialisation upon entry. And with oldhand FORTRAN programmers who insisted on the SAVed behaviour, like Steve refers to.)

mecej4 · ‎10-06-2022

Even after you add the SAVE attribute to the declaration for the local variable J, J remains undefined until the containing subroutine (S1 or S2, as appropriate) has been called with I = 1.

It may help if you specify what you want the program to do, and then ask for help regarding how to make that happen using Fortran, rather than show an incorrect program and ask what it does.

OP1 · ‎10-06-2022

First, my statement "I thought that the SAVE attribute was implied for recursive procedures?" was nonsensical indeed and I'll blame it on an inadvertent brain slip, ha ha.

Now, the small code here is really not meant to be realistic in any way. The first call to S1 and S2 must indeed be with I = 1. I was more interested in figuring out the difference in behavior between the two identical routines (one contained in a module, one contained in the main program). I thought that one of the two was correct and the other one wrongly implemented, but @Steve_Lionel clarified this by saying that both are wrong (there is no implicit save behavior to be expected for any of them), and it was just a matter of luck that S2 worked as intended.

FortranFan · ‎10-06-2022

@OP1 ,

Building on the other guiding comments in this thread that urge caution with the use of recursive algorithms in conjunction with "static" data (those with `SAVE` attribute in Fortran parlance), you may want to give thought to the following (it's perhaps something you're already aware of well but you didn't include in your original post for some reasons):

Should not your static data (e.g., object `J` in your example) get defined along with the type declaration? As you will know, that's what the Fortran language standard permits you to do.
Should your recursive algorithms have suitable guard(s) against unsupported or invalid values of the input argument? Say in the example you show, how should your `S1` / `S2` procedures behave if invoked as `call S2(-99)` or `call S1(42)`?
Should your recursive algorithm initiate program termination with the `STOP` statement upon end of recursion, or simply end the procedure invocation?

That is, did you intend for something along the following lines?

    SUBROUTINE S2(I)
    IMPLICIT NONE (TYPE, EXTERNAL)
    INTEGER, INTENT(IN) :: I

    INTEGER, SAVE :: J = nn !<-- Is this what you intended? where nn is some default value of J e.g., nn=2?

    IF (I < 1) THEN
       ..  ! RETURN?
    END IF 
    WRITE(*, *) I
    IF (I == 1) THEN
        J = 2
    ELSE
        J = J + I
    END IF
    IF (J > 5) RETURN
    CALL S2(J)
    END SUBROUTINE S2

JohnNichols · ‎10-06-2022

You have to really careful with recursive functions, if you want to learn how to use them learn Lisp, it is a great teacher.

Your J has a standard value of

-858993660 on creation, it is not unassigned it has a weird value - which is #CCCCCCC I think from memory. Could be wrong.

The program runs, but I is incremented from the weird value of J, I is now on the second run two higher than J.

If you run AutoLISP on Autocad data files using recursion, you can get a million runs correctly and then there is a simple obvious bug.

Your algorithm is flawed I fear, if it is any consolation, the marvelous book - Proving Ground shows the first known bug using ENIAC, your bug is as subtle.

Great read about the first six female programmers as in they were the first ever programmers.

It takes 40 hours to solve on ballistic equation by hand using a mechanical calculator.

Fortran is not naturally recursive as LISP. Lisp has the side effect which allows for efficient recursion. See example

jimdempseyatthecove · ‎10-06-2022

>>Your J has a standard value of ...-858993660 on creation, it is not unassigned it has a weird value - which is #CCCCCCC I think from memory.

MS C/C++ Debug Build initializes "uninitialized variables" with this value. gfortran (I think) uses BAADBEEF or something similar in Debug build. Release Build has no initialization, and may result in leftover data ***

*** Note, some O/S's pre-initialize process addresses, during process creation, that are not explicitly initialized (or contain code).

Jim Dempsey

JohnNichols · ‎10-17-2022

This is the problem, someone made a decision to make these the default values, even if it is null or some other element of a number. It is frustrating to see such stuff, pick a reasonable default value, zero for instance, which is what I expect most people would expect. I absolutely never use release build, no need.

Yes, I enjoyed reading the proving ground, I fear the misogny is still common place, although more subtle.

David_Billinghurst · ‎10-17-2022

> the marvelous book - Proving Ground

This is Proving Ground: The Untold Story of the Six Women Who Programmed the World’s First Modern Computer by Kathy Kleiman. https://www.amazon.com/Proving-Ground-Untold-Programmed-Computer/dp/1538718286/

An interesting read.