Solved: Weird interaction with subroutine arguments

John_S_9 · ‎05-17-2018

I don't know if this a bug or a feature or what but it's pretty sneaky and a really dangerous one.

Example:
program test_program
     implicit none
     real(8), dimension(2):: x

     x=1
     print*,x
     call subroutine sub1(x,x)
     print*, x

end program test_program

[On another module]

subroutine sub1(argIN, argOUT)
     implicit none
     real(8), dimension(2), intent(in):: argIN
     real(8), dimension(2), intent(out):: argOUT


     argOUT(1)=argIN(1)+argIN(2)
     argOUT(2)=argIN(1)+argIN(2) !The exact same as the previous line, only assigned to argOUT(2)
end subroutine test_sub

This program, you'd expect to print:
1.0000 1.0000 !For the old x
2.0000 2.0000 !For the x after the call of the subroutine

But instead, it prints:
1.0000 1.0000
2.0000 3.0000

Also, if you put a "print*, argOUT" command in the subroutine just between the declarations and the rest of the body it will even print "1.0000 1.0000", even though the argOUT hasn't been assigned any value yet.

So basically, if you do it that way (use the same variable for both of the arguments) the IN and the OUT arguments inside the subroutine are implicitly linked (argIN=argOUT) for no apparent reason.

The solution is of course easy. You either use different variables in the call statement or you use an intermediate variable in the subroutine as a buffer for the in argument (e.g temp=argIN(1) and then use temp). But unless, you know this quirkiness then you won't even think to implement the solution.

Is this intended? Is this a bug? Why dosen't the compiler at least show a warning?

Compiler: Intel(R) Visual Fortran Compiler 18.0.1.156

mecej4 · ‎05-17-2018

John S. wrote:
So basically, if you do it that way (use the same variable for both of the arguments) the IN and the OUT arguments inside the subroutine are implicitly linked (argIN=argOUT) for no apparent reason.

This argument is incorrect. After compiling the arguments are converted to machine addresses. The two arguments are not "implicitly linked"; they are identical! The subroutine behaves as if you had written

subroutine sub1(arg)
   implicit none
   real(8), dimension(2), intent(in out):: arg

   arg(1)=arg(1)+arg(2)
   arg(2)=arg(1)+arg(2)
 end subroutine sub1

or, if optimization is turned on, as

subroutine sub1(arg)
   implicit none
   real(8), dimension(2), intent(in out):: arg
   real(8) :: v

   v=arg(1)+arg(2)
   arg(1) = v
   arg(2) = v
 end subroutine sub1

Here is the 32-bit machine code of the subroutine of the original post, with my comments added:

_SUB1:
  000000E0: 55                 push        ebp
  000000E1: 8B EC              mov         ebp,esp
  000000E3: 8B 45 08           mov         eax,dword ptr [ebp+8]
  000000E6: 8B 55 0C           mov         edx,dword ptr [ebp+0Ch]     ; eax and edx contain the same value
  000000E9: F2 0F 10 00        movsd       xmm0,mmword ptr [eax]       ; load arg(1)
  000000ED: F2 0F 58 40 08     addsd       xmm0,mmword ptr [eax+8]     ; add arg(2)
  000000F2: F2 0F 11 02        movsd       mmword ptr [edx],xmm0       ; store into arg(1)
  000000F6: F2 0F 11 42 08     movsd       mmword ptr [edx+8],xmm0     ; store into arg(2)
  000000FB: 8B E5              mov         esp,ebp
  000000FD: 5D                 pop         ebp
  000000FE: C3                 ret

You can observe the effect of the overlap in the debugger, as well (see attached screenshot). Place a breakpoint at the first executable statement in the subroutine. When that is reached, observe the values of the argument variables. Step over the statement, and observe that ARGIN(1) changed. Note also that ARGIN and ARGOUT have the same address.

View solution in original post

mecej4 · ‎05-17-2018

The Fortran standard requires that "action that affects the value of the entity or any subobject of it shall be taken only through the dummy argument...". Often, people refer to this as the "anti-aliasing rule". Your subroutine call violates this rule, because there is overlap between the two subroutine arguments. Few compilers detect such aliasing, which causes strange results especially when optimization levels are set high.

You also specified INTENT for the overlapped arguments, which creates further complications. An argument with INTENT(OUT) becomes undefined at subprogram entry. That argument's aliased shadow is, however, INTENT(IN), and is referenced on the right hand sides of assignment statements.

Another interesting example of aliased arguments: https://groups.google.com/forum/#!topic/comp.lang.fortran/z11RW0ezojE .

John_S_9 · ‎05-17-2018

mecej4 wrote:

The Fortran standard requires that "action that affects the value of the entity or any subobject of it shall be taken only through the dummy argument...". Often, people refer to this as the "anti-aliasing rule". Your subroutine call violates this rule, because there is overlap between the two subroutine arguments. Few compilers detect such aliasing, which causes strange results especially when optimization levels are set high.

Another interesting example of aliased arguments: https://groups.google.com/forum/#!topic/comp.lang.fortran/z11RW0ezojE .

I mean, I think I know why this happens. It's because dummy arguments don't have a specified memory allocation on their own, but rather thery're just an alias for the memory space of the actual variable that is passed as argument (in my example, x). So it would make sense, once you change the first dummy argument (IN), the second one (OUT) would automatically change since they are both pointing to the same memory address.

But still, I would expect to either, the dummy arguments have themselves a memory address or at least a pseudo-address (I'm not really a compiler guy, I don't know the limitations)m or at the very least, the compiler to detect it and show a warning.

mecej4 · ‎05-17-2018

Consider this: there is nothing wrong with the subroutine itself. It is the call with duplicate or overlapping arguments that is the cause of the bug. Since external subroutines are usually compiled separately, it is quite difficult to plant code to detect the aliasing. At present, as far as I know, there are only two compilers that detect aliasing at run time. The current NAG compiler (version 6.20) can detect aliasing of scalar arguments. The Lahey-Fujitsu compiler 8.10b for Linux gives the following diagnosis (I had to remove the INTENTs for the subroutine arguments to get this to work).

jwe1576i-w line 18 There is an overlap in dummy argument (argOUT) and dummy argument (argIN). The part of overlap is changed.

The wording of the message is a bit awkward ("The part of overlap is changed").

andrew_4619 · ‎05-17-2018

So to answer the original question, it is a bug in your source code that the compiler does not detect. The fact that the compiler does not detect it is not that unusual, there are many things a compiler does not or cannot be reasonably expected to detect. Though, in this instance it would be nice if it could. Even with an explicit interface there is no error. All that said I won't be losing any sleep over this one.....

mecej4 · ‎05-17-2018

John S. wrote:
So basically, if you do it that way (use the same variable for both of the arguments) the IN and the OUT arguments inside the subroutine are implicitly linked (argIN=argOUT) for no apparent reason.

This argument is incorrect. After compiling the arguments are converted to machine addresses. The two arguments are not "implicitly linked"; they are identical! The subroutine behaves as if you had written

subroutine sub1(arg)
   implicit none
   real(8), dimension(2), intent(in out):: arg

   arg(1)=arg(1)+arg(2)
   arg(2)=arg(1)+arg(2)
 end subroutine sub1

or, if optimization is turned on, as

subroutine sub1(arg)
   implicit none
   real(8), dimension(2), intent(in out):: arg
   real(8) :: v

   v=arg(1)+arg(2)
   arg(1) = v
   arg(2) = v
 end subroutine sub1

Here is the 32-bit machine code of the subroutine of the original post, with my comments added:

_SUB1:
  000000E0: 55                 push        ebp
  000000E1: 8B EC              mov         ebp,esp
  000000E3: 8B 45 08           mov         eax,dword ptr [ebp+8]
  000000E6: 8B 55 0C           mov         edx,dword ptr [ebp+0Ch]     ; eax and edx contain the same value
  000000E9: F2 0F 10 00        movsd       xmm0,mmword ptr [eax]       ; load arg(1)
  000000ED: F2 0F 58 40 08     addsd       xmm0,mmword ptr [eax+8]     ; add arg(2)
  000000F2: F2 0F 11 02        movsd       mmword ptr [edx],xmm0       ; store into arg(1)
  000000F6: F2 0F 11 42 08     movsd       mmword ptr [edx+8],xmm0     ; store into arg(2)
  000000FB: 8B E5              mov         esp,ebp
  000000FD: 5D                 pop         ebp
  000000FE: C3                 ret

You can observe the effect of the overlap in the debugger, as well (see attached screenshot). Place a breakpoint at the first executable statement in the subroutine. When that is reached, observe the values of the argument variables. Step over the statement, and observe that ARGIN(1) changed. Note also that ARGIN and ARGOUT have the same address.

John_S_9 · ‎05-17-2018

mecej4 wrote:

Quote:

John S. wrote:
So basically, if you do it that way (use the same variable for both of the arguments) the IN and the OUT arguments inside the subroutine are implicitly linked (argIN=argOUT) for no apparent reason.

This argument is incorrect. After compiling the arguments are converted to machine addresses. The two arguments are not "implicitly linked"; they are identical! The subroutine behaves as if you had written

Yeah. I kinda figured that out:

John S. wrote:

I mean, I think I know why this happens. It's because dummy arguments don't have a specified memory allocation on their own, but rather thery're just an alias for the memory space of the actual variable that is passed as argument (in my example, x). So it would make sense, once you change the first dummy argument (IN), the second one (OUT) would automatically change since they are both pointing to the same memory address.

Nice illustration with the binary code there. Usually I check all my subroutines, not only in terms of syntax but also to check if they do do what I want them to do. But this one slipped the cracks and I had it in my code for such a long time that it surprised me when I found out about it that the compiler was completely unaware. As I said before, I would expect at least some kind of warning, but as people pointed out it's not that easy. Oh well, I guess, live and learn. Now I know.

FortranFan · ‎05-18-2018

John S. wrote:

.. Usually I check all my subroutines, not only in terms of syntax but also to check if they do do what I want them to do. But this one slipped the cracks and I had it in my code for such a long time that it surprised me when I found out about it that the compiler was completely unaware. As I said before, I would expect at least some kind of warning, but as people pointed out it's not that easy. Oh well, I guess, live and learn. Now I know.

@John S.,

To what extent is your original post reflective of your actual code? I ask because the code therein does NOT illustrate taking advantage of the existing facility in the language and the compiler support for it in the form of explicit interfaces.

Re: "learn", I will suggest reviewing this Dr Fortran blog closely and immediately starting to use the lessons in there:

https://software.intel.com/en-us/blogs/2012/01/05/doctor-fortran-gets-explicit-again

mecej4 · ‎05-18-2018

Andrew (in #5) and I found that providing an explicit interface (or using the Intel compiler's warn-interfaces option) did not lead to a diagnosis of the aliasing problem.

The posted example is small enough that one can spot the aliased arguments right away. The actual application code may have several additional arguments in the subroutine, and a lot more lines of code, so it can be difficult to detect and isolate this type of bug.

andrew_4619 · ‎05-19-2018

mecej4 wrote:
Andrew (in #5) and I found that providing an explicit interface (or using the Intel compiler's warn-interfaces option) did not lead to a diagnosis of the aliasing problem.

Yes That is what I said. I had tested an explicit interface also.

John_S_9 · ‎05-19-2018

FortranFan wrote:

Quote:

John S. wrote:

.. Usually I check all my subroutines, not only in terms of syntax but also to check if they do do what I want them to do. But this one slipped the cracks and I had it in my code for such a long time that it surprised me when I found out about it that the compiler was completely unaware. As I said before, I would expect at least some kind of warning, but as people pointed out it's not that easy. Oh well, I guess, live and learn. Now I know.

@John S.,

To what extent is your original post reflective of your actual code? I ask because the code therein does NOT illustrate taking advantage of the existing facility in the language and the compiler support for it in the form of explicit interfaces.

Re: "learn", I will suggest reviewing this Dr Fortran blog closely and immediately starting to use the lessons in there:

https://software.intel.com/en-us/blogs/2012/01/05/doctor-fortran-gets-ex...

The actual code (the call and the subroutine) is pretty much the same only with different operations. Yes, I did not use interfaces, but I did not expect to have to because (in my mind) it was a really simple subroutine.

andrew_4619 · ‎05-19-2018

John S. wrote:
Yes, I did not use interfaces, but I did not expect to have to because (in my mind) it was a really simple subroutine.

If you put your routines in MODULES then you get automatic interfaces and interface checking 100% all the time which kicks out all manner of [potential error for no extra effort.

John_S_9 · ‎05-19-2018

andrew_4619 wrote:

Quote:

John S. wrote:
Yes, I did not use interfaces, but I did not expect to have to because (in my mind) it was a really simple subroutine.

If you put your routines in MODULES then you get automatic interfaces and interface checking 100% all the time which kicks out all manner of [potential error for no extra effort.

The subroutine is in a module though.

TimP · ‎05-19-2018

If you will try this frequently, you might file a feature request for the compiler to check for this error when interface checking is enabled, either through explicit interface (which you imply you are using) or compile option. If there is reason why it's not feasible, it might be interesting to know.

cryptogram · ‎05-21-2018

My best story along these lines goes back 30+ years to one of the IBM mainframe compilers.

I had passed a real constant 0.0 to a subroutine, and made the mistake of assigning a value to it.

But the funny part is that is that the assignment didn't actually change the value of 0.0, it changed the representation to

a different representation.

Well, so what?, you say. The reason that it made a difference, is that the compiler writers in a fit of excess efficiency had

used the same memory location to store both real and integer 0 constants, as 4 bytes of 00 00 00 00. The assignment changed

the byte pattern for real zero to something like 40 00 00 00 (memory a bit fuzzy after 30 years), And now all my integer 0 constants

are suddenly a really big number. Took me a while in the debugger to figure out how I had managed to screw things up so badly.