I encountered a stack overflow. Investigating, I found the reason to be the handling of large strings in a module compiled using the /assume:dummy_aliases compiler option.
I removed the said compiler option from being used w/ that file as it was unnecessary. However, we are still using it for a very large amount of other files and I dare not disable it without looking at the files.
Thus, I would like to ask whether this is a compiler bug or by design?
Please find below a minimal sample program to reproduce the problem.
Use this source on Windows, x64 compiler, ifort 2020.4:
! using 64 bit ifort 2020.4 ! ifort ifort /nologo /assume:dummy_aliases stack_overflow_slicing.f90 module somemodule implicit none ! define a type containing allocatable string contains pure function copy_str( original ) result( copy ) implicit none character(len=*), intent(in) :: original character(len=:), allocatable :: copy ! stack overflow on this line when compiler option /assume:dummy_aliases is used copy = original end function end module program StackOverflowing use somemodule implicit none integer :: i character(len=:), allocatable :: s, s2 integer, parameter :: BIG_STRING_SIZE = 8388609 ! 2 ** 23 + 1 = a hair over 8MB, although anything that does not fit on stack would suffice ! this produces stack overflow regardless of used compiler options ! s = repeat( "All work no play makes Jack a dull boy.", 220000 ) allocate( character( len = BIG_STRING_SIZE ) :: s ) do i = 1, BIG_STRING_SIZE s(i:i) = 'x' end do ! stack overflow in this call when compiler option /assume:dummy_aliases is used s2 = copy_str(s) write (*,*) s2(1:25) write (*,*) len(s) write (*,*) "bye" end program
Compiling without the /assume:dummy_aliases compiler option results in an executable that does not crash.
Thus, it seems that when using /assume:dummy_aliases the compiler is reserving a temporary stack buffer. In this simple case it is not necessary, but I assume it is connected to being extra cautious w/ parameters when the said compiler option is used.
I think what is happening is that the compiler is being a bit too pessimistic. Evidently it is worried that the dummy argument "original" might overlap with another dummy argument or common/module variable, and it creates a temporary before assigning to the function result (which itself is a temporary). That' s the correct thing to do in most cases, though it is unnecessary in this contrived example.
You can see if enabling /heap-arrays changes the behavior, or just increase the Stack Reserve size enough to avoid the error. (Don't go too crazy with this as it eats into the static code and data address space limit of 2GB, even on x64.)
Thanks for the insights!
You are absolutely correct that enabling /heap-arrays resolves this situation. As a matter of fact, this crash was brought about by me removing /heap-arrays option on our code base's compilation except for select few files where it seemed necessary. This was related to some rather big performance issues I previously asked about in this thread (https://community.intel.com/t5/Intel-Fortran-Compiler/Dynamic-array-reservation-performance-question....
I already resolved the issue by removing the /assume:dummy_aliases from being used to compile the source file where the problem was observed as it was unnecessary there. I was asking about this issue to get a better understanding about what is happening and why and if my reasoning about it is correct.
Add an additional allocatable local variable (e.g. localCopy) do the initial copy to localCopy, then use MOVE_ALLOC to (if necessary) deallocate copy, then copy only the descriptor (pointer and size), zeroing the allocation in localCopy.
This eliminates the need for stack local temporary (when alias is assumed to be possible).
Thanks for the idea.
Interestingly, even this change seems to resolve the issue (i.e. no local copy + move_alloc necessary). This is a bit baffling as I would have thought the behavior of this code to be identical in behavior to the one in my previous post.
pure function copy_str( original ) result( copy ) implicit none character(len=*), intent(in) :: original character(len=:), allocatable :: copy ! stack overflow on this line when compiler option /assume:dummy_aliases is used !copy = original allocate( character( len = len( original ) ) :: copy ) copy(1:len(original)) = original(1:len(original)) end function
I wonder if it would be possible to in some future time have in ifort something like /check:pointer-aliasing option similar to what exists in a very limited scope in e.g. address sanitizer for C: https://docs.microsoft.com/en-us/cpp/sanitizers/error-memcpy-param-overlap?view=msvc-160
The idea being that if that if /assume:dummy_aliases is not enabled and /check:pointer-aliasing would be, the compiler would generate code similar to what address sanitizer does for memcpy (and actually also for e.g. strcpy, found some bugs that way). I think that might help programmers find some very hard to detect bugs.
Disclaimer: not being a compiler developer myself I have no idea how hard or easy this would be.
And how this relates to my case at hand is that /assume:dummy_aliases is enabled on the code base I am working on. The reasons have been lost in history and I have only dared to disable this option on select files. Would be nice to disable it for most files and possibly enable it for the select few where it is actually necessary (which I assume would be a very small percentage of all the source files).
Just off the top of my head, I'd say there would likely not be a lot of enthusiasm for trying to optimize a use that violates the Fortran standard.
I may have expressed myself badly. What I tried to propose is a feature checking there is no aliasing, which is, if I have understood correctly, as per Fortran standard? I.e. Fortran standard says pointer aliasing is not allowed. Or have I misunderstood this?
Thus the option I proposed /check:pointer-aliasing would check exactly that - that pointers are not aliased. At least to a limited extent. Just as in the example I linked to in the C address sanitizer documentation.
I would think there would be a lot of interest for a feature like this that would help find hard to detect bugs that are due to code that does not adhere to the Fortran standard.
What the standard says is that changes to a dummy argument may be made only through that dummy argument (with some exceptions - the details are spelled out in section 18.104.22.168 " Restrictions on entities associated with dummy arguments") The compiler may assume that, unless it sees a possible definition/undefinition of a dummy argument that it remains as it was. This allows a compiler, for example, to keep a local copy of a dummy in a register.
The standard DOES allow aliasing of anything with the TARGET attribute, including POINTERs, so there's no point (!) in checking for that.
How exactly would a compiler be able to check that a dummy argument isn't aliased (which can be an overlap, not necessarily same address) to a COMMON or MODULE variable?
As a side question - is the Fortran standard available online? I could not locate it w/ quick googling.
The official standard is copyrighted by ISO and not available freely. However, there is a document that the committee uses as a reference for interpretations, as none of us, I think, have an official ISO copy.
I misused the term "pointer" in my previous post, being also a C programmer. Noting this I now see why my post was confusing. A more suitable name for the kind of check I am proposing would maybe be e.g. /check:parameter-aliasing, nothing to do w/ Fortran pointers.
How the checking could be done is a little outside my expertise. I assume that doing conclusive checking may be too difficult. However, C address sanitizer somehow does this to a certain extent, e.g. detecting (at runtime of course) e.g. that pointers a call to strcpy or memcpy has been given overlapping memory regions (please see the link in one of my previous posts in this thread). I suppose that when checking calls to given library functions this is simpler as the semantics is known, so this may be too difficult in the general case.
As Fortran parameters are by default passed by reference, one simple check would be that these references are different from one another. Not a very conclusive check naturally.
Adding overlapping address checks of each dummy argument to each other argument (on every call) would add too much overhead when the Fortran "commandments" states the the programmer SHALL NOT pass in aliasing arguments to a subroutine or function.
Now this does not prohibit you from sinning and aliasing arguments, but you do so at your own risk. IOW what works today may not work tomorrow.
Having now debugged a few cases where this turned out to be a problem I started wondering whether it would be possible to issue a compile time warning about the same variable / array expression being passed as a parameter multiple times. At least when standards checking is enabled (e.g. using /stand:f18)
Examples that would produce the warning (from real code):
CALL MA15(CC,CC,4,4) ! no warning for passing 4 twice ofc, but for CC
CALL GB8910(TRANSF,P,P) ! warning for passing P twice
CALL MA15(LREC(8:),LREC(8:),4,4) ! warning for passing the same array slice twice
I know the check would not be comprehensive, i.e. it would not catch all possible cases. It might be very useful, though. And I imagine it would not be too hard to implement.
And warning might be appropriate, because as you said, overlapping arguments are not necessarily an error.
What I suggest you can do is to use the Fortran PreProcessor and, as an example for MA15, #define MA15 CheckMA15.
Then add a subroutine checkMA15 with same arguments, but compiled without the aforementioned #define that tests for aliases. Emits a message when found, then calls MA15. This may produce a false positive for the 4,4 depending on if the compiler optimization produced 1 or 2 stack temporaries. This though can be filtered out by checking the proximity of the addresses of 4,4 to that of the call stack.
Note, the alias check for MA15 should only test for aliasing of INTENT(OUT) and INTENT(INOUT) variables (and thus not test on the 4,4).
Note 2, to be a formal/complete test the test(s) should include testing (OUT/INOUT)'s for being stack temporaries as well, though this may be more difficult. Consider
CALL MA15(LREC(8:)*2,LREC(8:),4,4) ! first arg becomes reference to stack temporary.
The problem is not MA15 anymore as I already found that was causing problems and finding all references to it in the codebase is easy enough. Not so many of them that they could not be checked and corrected.
The problem are the thousands of other routines that I did not check. That is where I would appreciate if I could get a warning from the compiler.
One thing that might make sense to add to the compiler warnings is that if the parameter intents have been defined and the same variable is passed as input and output, i.e. some combination of intent(in), intent(out) and intent(inout) the compiler would issue a warning?
Ofc assuming the interface is available, such as w/ module procedures or when using /gen-interfaces
As a simple example:
integer, intent(in) :: a
integer, intent(out) :: b
call foo(x,x) ! warning: potential aliasing problem
It isn't a standards error, the calls you show are conforming. With new code there are plenty of facilities (e.g. specify intent) to prevent many inadvertent coding errors but with old code you are really chasing bugs that have always been there, the problem is what the code does with the call parameters not the call itself. The compiler can't know what the user intended to happen it can only apply standard rules to what is coded.
You are right - the code is conforming to the standard. As Steve quoted the standard in one of the previous replies, the standard does not disallow aliasing per se, but it says "The compiler may assume that, unless it sees a possible definition/undefinition of a dummy argument that it remains as it was.".
That means that while passing e.g. the same variable twice in a call is not necessarily incorrect, it may indicate a problem. And that's the kind of thing many warnings help the programmer pay attention to. A warning does not say what is there is wrong, but it may be risky. And warnings one wants to ignore can easily be turned off.
Examples of some of the things that are just warnings that we have turned into errors "This name has not been given an explicit type.", "A null argument exists in a subroutine CALL or in a function reference.", "The return value of this FUNCTION has not been defined.".
I would think that being able to get a warning about a potential problem due to parameter aliasing would help modify a large codebase such as ours to a better direction. Using /assume:dummy_aliases has its share of problems (e.g. the one this thread was originally about).
Catching aliasing errors can be very difficult when the aliased arguments are arrays, including different sections of a larger work array.. The mere threat of aliasing should not cause warning messages to be issued, because threatening is not forbidden. The value of one variable must be changed by means other than through its own name for a violation of the rule to occur.
An interesting example of aliasing was discussed three years ago in comp.lang.fortran, in relation to the venerable Eispack software. The aliasing error was in one of the test drivers, not in the Eispack subroutines themselves.