Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29249 Discussions

assume dummy_aliases and performance

Les_Neilson
Valued Contributor II
2,603 Views
Dealing as we do with legacy code we sometimes come across instances where the same actual argument is passed twice to a subroutine e.g.

real :: p1(3),p2(3)
call asub(p1,p2,p1)

subroutine asub(v1,v2,v3)
real :: v1(3),v2(3),v3(3)
and v3 is some function of v1 and v2 (usually via a local array)

We have found that32bit optimised releaseworks ok but 64bit optimised releasegives different results. (which is how wecame acrossthe problem)
So as a short term solution (until wecan identify and fix all the codes)we are looking at using the assume dummy_aliases compiler option on all of our fortran projects to fix the problem.

What I would like to know is: Is it likely tohave a noticable effect on performance?

Sorry for the longwinded ramble.

Les
0 Kudos
15 Replies
mecej4
Honored Contributor III
2,603 Views
> Is it likely tohave a noticeable effect on performance?

One undesirable and possibly unnoticed effect is that bugs in the code may not be fixed for a few months or years...

The problem with using an option such as /assume:dummy_aliases is that it is a sledge-hammer based solution. In other words, all arguments are candidates to be copied in and copied out in all the subroutines compiled with that option. If subprogram A calls subprogram B which in turn passes some of its arguments to another subprogram C, ..., we could see a lot of copying done, whether it is needed or not.

Here is a short example.

[fortran]subroutine asub(v1,v2,v3)
  real, intent(in) :: v1(3),v2(3)
  real, intent(out) :: v3(3)
  v3(1)=v1(2)*v2(3)-v1(3)*v2(2)
  v3(2)=v1(3)*v2(1)-v1(1)*v2(3)
  v3(3)=v1(1)*v2(2)-v1(2)*v2(1)
return
end subroutine asub

subroutine tst
   real :: p1(3)=(/1.0,2.0,3.0/), &
           p2(3)=(/3.0,2.0,1.0/), &
           p3(3)
   call asub(p1,p2,p3)        ! nothing aliased, call does not need copying
   call asub(p1,p2,p1)        ! call needs copying of only p1
end subroutine tst
[/fortran]
I found that, with the 11.1.070 and 12.0.4 32-bit compilers, the options /Qcommon-args or /assume:dummy_aliases seemed to have no effect. Instead, whether the aliasing caused the cross-product to be wrong depended on the optimization level.

My example has another bad property, because I declared INTENTs. According to the Fortran Standard, the values in the array p3 become undefined upon entry to ASUB because it is declared to be INTENT(OUT). If called with aliased arguments, as in the second call, what is the effect of that on the values in the first argument? Could an INTENT(IN) array become undefined?
0 Kudos
Les_Neilson
Valued Contributor II
2,603 Views
Thanks mecej4 for your comments.
Yes Irecognise that this is a bug which was waiting to bite and has now bitten. :-(

Although the code I havelooked atso far (similar to your asub) uses a local array to calculate v3.
Basically two do loops - one to calculate local_v3 and the second to assign to dummy argument v3
It appears that the code is simple enough for the optimiser to re-order the code and maybe do away with the loops altogether. (My assembly knowledge is over 30 years old and mainframe based so it would take me some time to look at the assembly listing and work out what was going on)

The dummy_aliases option is only a temporary workaround as I will be identifying and fixing the offending code.

Shortly after posting the original (and waking up the brain cells with a coffee) I found in the help a comment that there will be a performance hit - somehow I totally missed that block of help info.

Les
0 Kudos
mecej4
Honored Contributor III
2,603 Views
I may have added some confusion by editing my reply after you had read it. I removed the comments about the assembly because I realized that I imagined that I had seen what I expected to see in the assembly listing.

I looked up the old Compaq Fortran documentation, and found this:

You can link routines compiled with the /assume:dummy_aliases option to routines compiled with /assume:nodummy_aliases. For example, if only one routine is called with dummy aliases, you can use /assume:dummy_aliases when compiling that routine, and compile all the other routines with /assume:nodummy_aliases to gain the performance value of that option.

I now have misgivings about what the /assume:dummy_aliases option does, and I wondered whether you were placing unwarranted reliance on its working correctly. Perhaps, you can tell me what I am doing wrong. I took the example in the Compaq user guide:

[fxfortran]      Program TSTS
      double precision a,X(3),y(3)
      data X/1d0,2d0,3d0/,y/3d0,2d0,1d0/
      call daxpy(3,y(1),X,1,Y,1)     ! argument DA aliased with DY(1)
      write(*,'(1x,3F6.1)')Y
      end
      
      SUBROUTINE DAXPY(N,DA,DX,INCX,DY,INCY)

C     Constant times a vector plus a vector.
C     uses unrolled loops for increments equal to 1.

      DOUBLE PRECISION DX(1), DY(1), DA
      INTEGER I,INCX,INCY,IX,IY,M,MP1,N
C
      IF (N.LE.0) RETURN
      IF (DA.EQ.0.0) RETURN
      IF (INCX.EQ.1.AND.INCY.EQ.1) GOTO 20

C     Code for unequal increments or equal increments
C     not equal to 1.
      STOP 'No code given for INCX or INCY not equal to 1'
C
C     Code for both increments equal to 1.

20    M = MOD(N,4)
      IF (M.EQ.0) GOTO 40
      DO I=1,M
          DY(I) = DY(I) + DA*DX(I)
      END DO

      IF (N.LT.4) RETURN
40    MP1 = M + 1
      DO I = MP1, N, 4
          DY(I) = DY(I) + DA*DX(I)
          DY(I + 1) = DY(I + 1) + DA*DX(I + 1)
          DY(I + 2) = DY(I + 2) + DA*DX(I + 2)
          DY(I + 3) = DY(I + 3) + DA*DX(I + 3)
      END DO

      RETURN
      END
[/fxfortran]
Here are results using CVF 6.6C:

OPTIONS USED RESULT

/opt:2 /assume:nodummy_aliases 6.0 8.0 10.0 (no alias effect)
/opt:2 6.0 8.0 10.0
/opt:0 6.0 14.0 19.0
/opt:0 /assume:dummy_aliases 6.0 14.0 19.0

and with IFort 12.0.4:

OPTIONS USED RESULT

/Od 6.0 14.0 19.0
/Od /assume:dummy_aliases 6.0 14.0 19.0
/fast 6.0 14.0 19.0
0 Kudos
Steven_L_Intel1
Employee
2,603 Views
The /assume:dummy_aliases option does not make extra copies. What it does is disable optimizations that would allow the compiler to keep a dummy argument value in a register, forcing it to go back to the original address memory on each read or write. Same with COMMON - normally, the compiler can assume that COMMON variables are not accessed except during procedure calls, so it might keep an intermediate value in a register.

Whether you'll see a difference depends a lot on how complex the code is and whether the optimizer sees a benefit from avoiding memory references. I'll comment that Intel Fortran does interprocedural analysis, so it can see in the example that there is an alias, disabling the optimization. Indeed, it probably does some inlining here as well.

This option has no effect on the caller of a procedure - it's all within the procedure compiled with the option.

I separated the main program from the subroutine. With 12.0.4, I compiled the subroutine with and without /assume:dummy_aliases - the main program was compiled with default options.

DAXPY without /assume:dummy_aliases: 6.0 8.0 10.0
DAXPY with /assume:dummy_aliases: 6.0 14.0 19.0
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,603 Views
When your code is passing in two references to the same variable/array and where the code called assumes each argument references different data then either the caller or the written code in the subroutine is incorrect. The fact that programs used to work is a "fact" based on fortuitous accidental happenstance. In this curcumstance it may be best to correct the code in the subroutine .or. if you can use interfaces, and whenit makes sense, indicate pass by value as opposed to reference. For something like your small array subroutines it would likely be best to explicitly write in your own temporary array into your subroutine. Generally the temporary array is usefor computingthe output, then you finalize the subroutine with a copy of the temporary array to the output. Asking for an option switch to do this automatically may result in undesired/unexpected results.

Jim Dempsey
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,603 Views
>>a "fact" based on fortuitous accidental happenstance.

This reminds me of a non-programming illustration.

I live in dairy farming country. Dairy farms usually have a structure called a silo. This is a tubular shaped building standing about 60 feet tall with a domed cap. Inside you place silage(chopped-up corn stalks and leaves to feed your cows). You dump the silage into the top of the structure and have a auger type device that removes the silage from the bottom. One winter this farmer (a relative of my wife) had problems with the door in the dome at the top of the silo. It wasn't closing and it was causing the silage to freeze in the silo. He asked a repair man to come out an fix the door. He left the repair man to do his work and came out later and not see him he (farmer) climbed up the outside of the silo to look at the work. When he got there he found the repair man was still working on the door while standing on a ladder placed into the silo on top of the silage. The farmer kindly asked the repair man to stop working for a while and to come with him. The both climbed down the outside of the silo and the farmer took the repairman to the access door at the bottom of the silo to show him what was inside the silo. The silo was empty from the ground up 50 feet to a 10 foot deep plug of frozen silage at the top of the silo. At the sight of this the repair man broke out in a cold sweat under the sudden realization that his ladder was standing on not much more than open air.

Now you have to ask yourself:

Will this repair man rely on a "fact" based on fortuitous accidental happenstance?

Fix your code or suffer the consequences.

Jim Dempsey
0 Kudos
mecej4
Honored Contributor III
2,603 Views
DAXPY without /assume:dummy_aliases: 6.0 8.0 10.0
DAXPY with /assume:dummy_aliases: 6.0 14.0 19.0

And that is exactly why I am perturbed. The code ran with no perceptible effect of aliasing in the first line, with the default options! The results are the same as if the multiplier a in the y = a.x + y operation had been passed by value, and is the kind of behavior that the programmer would want.

In the second line, the code exhibited precisely the behavior that Les Neilson is trying to avoid by using the /assume:dummy_aliases compiler option!

I have read the Intel Fortran User Guide and see that the explanation that Steve gave is consistent with what is stated in the User Guide. However, what needs to be clarified better is the effect of the /assume:dummy_aliases option. Is it

(i) to assume that the programmer is not aware that arguments may be aliased and to make the routine behave the same as if arguments were copied in and out?

OR

(ii) to assume that the aliasing is intended by the programmer, and to make sure that the aliased variables are updated in memory to keep a coherent picture?

I think that Les wants (i); for the Daxpy example (i) makes more sense, and I had assumed that the compiler option was meant to make that happen. Thanks to Steve, I now think that (ii) is what the compiler does!
0 Kudos
Steven_L_Intel1
Employee
2,603 Views
"ii" is what it does. If what you want is to protect the actual argument against being written (by an alias or directly), enclose it in parentheses. Then the Intel compiler will make a copy. Your program is non-standard if it writes to a dummy that is associated with an expression, but we give you a way to make it work.
0 Kudos
Les_Neilson
Valued Contributor II
2,603 Views

Some of the library code was written on clay tablets with pointed sticks - ok I exaggerate a little. But was certainly writtenwhen array operations like A = B + C had to be written with do loops. So we have some code :

[bash]      subroutine vadd(v1,v2,v3)
      implicit none

! Dummy arguments

      real*8 V1(3)
      real*8 V2(3)
      real*8 V3(3)

! Local variables

      integer*4 I

      do 100 i = 1 , 3
        v3(i) = v1(i) + v2(i)
 100  continue

      return
      end
[/bash]

Now this code is called both as
call vadd(a,b,c)
and
call vadd(p1,p2,p1)

In theory there should be no problem (?) with this sort of code- (there is a VSUB routine also).
I realise that more complicated (e.g. vector product) codewill likelybe problematical.
Finding and modifying the errant code isgoing to keep me busy for some time to come. :-)

Les
0 Kudos
mecej4
Honored Contributor III
2,603 Views
I can't see why there would be any aliasing problems with this example. Each element of v1 and v2 is used only once; so, even if one of the array elements on the right side of

v3(i)=v1(i)+v2(i)

is aliased to the array element on the left, there should be no need for /assume:dummy_aliases. It is with this in mind that I chose a cross-product for my first example.

Les, will you please look at reply #7 and clarify whether you want behavior (i) or behavior (ii) as described there?
0 Kudos
Les_Neilson
Valued Contributor II
2,603 Views
I would say (ii) is what the original programmers expected. They deliberately use the same variable name twice in various subroutine calls (and there is no use of pointers throughout the code to "hide" any aliasing).

There are sometimes comments where they expect the possibility of two dummy arguments being the same

[bash]      subroutine multmv(mat,p1,p2)
      implicit none
! Dummy arguments
      real*8 MAT(3,3)
      real*8 P1(3)
      real*8 P2(3)

! Local variables
      integer*4 I
      real*8 P3(3)

!--Use p3 to safe-guard against overwrite of p1 when p2 an d p1 are same

      do 100 i = 1 , 3
        p3(i) = mat(1,i)*p1(1) + mat(2,i)*p1(2) + mat(3,i)*p1(3)
 100  continue

      do 200 i = 1 , 3
        p2(i) = p3(i)
 200  continue

      return
      end
[/bash]



Les
0 Kudos
mecej4
Honored Contributor III
2,603 Views
Thanks for clarifying. Now I see clearly what you meant by "v3 is some function of v1 and v2 (usually via a local array)". There is no danger of any aliasing at all here, since no elements of the input arrays are changed until all the computations have been done and the results put into the local array p3.

Unfortunately for you, it follows that if /assume:dummy_aliases made a difference to the results, you have to look elsewhere for the problem.
0 Kudos
Les_Neilson
Valued Contributor II
2,603 Views

Further clarification.
The original problem was found by my colleage when running the 64bit version of the exe.
The 32 bit version ran ok.
At first I did not know what the actual routine was that caused the problem. I have just spoken to him about it and found that it is exactly the one I posted -multiply vector by a matrix
(that wasshear good fortune on my part to choose that particular routine as an example)

Since he was getting the wrong answer his guess was that optimisationcould beeliminating the two loops and doing something like :
calculatelocal_p3(1) = mat(1,1)*p1(1) + mat(2,1)*p1(2) + mat(3,1)*p1(3)
p2(1) = local_p3(1) Oops just overwritten p1(1)
calculate local_p3(2) = mat(1,2)*p1(1) ...now using wrong value ofp1(1)
p2(2) = local_p3(2) Overwrite p1(2)
repeat forp2(3)

assume:dummy_aliases apparently fixed the problem

Our boss then asked me to
(a) look at any performance hit if we compile the libraries with dummy_aliases and
(b)identify any other instances where this problem occurs.

Les

0 Kudos
mecej4
Honored Contributor III
2,603 Views
I compiled the subroutine multmv with (i) /O2 /c and (ii) /O2 /c /assume:dummy_aliases. The two object files produced were identical except for the date stamp in the headers, with the 32-bit as well as with the 64-bit compilers.

Because of the /O2 optimization level, the loops get unrolled. Intermediate results are kept in the XMM registers. Nothing gets written to memory until just before the RETURN from the subroutine.

Things would have been different had the local array p3 not been provided and used.
0 Kudos
Les_Neilson
Valued Contributor II
2,603 Views
OK Thanks for looking at this.

Les
0 Kudos
Reply