- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
real :: p1(3),p2(3)
call asub(p1,p2,p1)
subroutine asub(v1,v2,v3)
real :: v1(3),v2(3),v3(3)
and v3 is some function of v1 and v2 (usually via a local array)
We have found that32bit optimised releaseworks ok but 64bit optimised releasegives different results. (which is how wecame acrossthe problem)
So as a short term solution (until wecan identify and fix all the codes)we are looking at using the assume dummy_aliases compiler option on all of our fortran projects to fix the problem.
What I would like to know is: Is it likely tohave a noticable effect on performance?
Sorry for the longwinded ramble.
Les
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One undesirable and possibly unnoticed effect is that bugs in the code may not be fixed for a few months or years...
The problem with using an option such as /assume:dummy_aliases is that it is a sledge-hammer based solution. In other words, all arguments are candidates to be copied in and copied out in all the subroutines compiled with that option. If subprogram A calls subprogram B which in turn passes some of its arguments to another subprogram C, ..., we could see a lot of copying done, whether it is needed or not.
Here is a short example.
[fortran]subroutine asub(v1,v2,v3) real, intent(in) :: v1(3),v2(3) real, intent(out) :: v3(3) v3(1)=v1(2)*v2(3)-v1(3)*v2(2) v3(2)=v1(3)*v2(1)-v1(1)*v2(3) v3(3)=v1(1)*v2(2)-v1(2)*v2(1) return end subroutine asub subroutine tst real :: p1(3)=(/1.0,2.0,3.0/), & p2(3)=(/3.0,2.0,1.0/), & p3(3) call asub(p1,p2,p3) ! nothing aliased, call does not need copying call asub(p1,p2,p1) ! call needs copying of only p1 end subroutine tstI found that, with the 11.1.070 and 12.0.4 32-bit compilers, the options /Qcommon-args or /assume:dummy_aliases seemed to have no effect. Instead, whether the aliasing caused the cross-product to be wrong depended on the optimization level.
[/fortran]
My example has another bad property, because I declared INTENTs. According to the Fortran Standard, the values in the array p3 become undefined upon entry to ASUB because it is declared to be INTENT(OUT). If called with aliased arguments, as in the second call, what is the effect of that on the values in the first argument? Could an INTENT(IN) array become undefined?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes Irecognise that this is a bug which was waiting to bite and has now bitten. :-(
Although the code I havelooked atso far (similar to your asub) uses a local array to calculate v3.
Basically two do loops - one to calculate local_v3 and the second to assign to dummy argument v3
It appears that the code is simple enough for the optimiser to re-order the code and maybe do away with the loops altogether. (My assembly knowledge is over 30 years old and mainframe based so it would take me some time to look at the assembly listing and work out what was going on)
The dummy_aliases option is only a temporary workaround as I will be identifying and fixing the offending code.
Shortly after posting the original (and waking up the brain cells with a coffee) I found in the help a comment that there will be a performance hit - somehow I totally missed that block of help info.
Les
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I looked up the old Compaq Fortran documentation, and found this:
You can link routines compiled with the /assume:dummy_aliases option to routines compiled with /assume:nodummy_aliases. For example, if only one routine is called with dummy aliases, you can use /assume:dummy_aliases when compiling that routine, and compile all the other routines with /assume:nodummy_aliases to gain the performance value of that option.
I now have misgivings about what the /assume:dummy_aliases option does, and I wondered whether you were placing unwarranted reliance on its working correctly. Perhaps, you can tell me what I am doing wrong. I took the example in the Compaq user guide:[fxfortran] Program TSTS double precision a,X(3),y(3) data X/1d0,2d0,3d0/,y/3d0,2d0,1d0/ call daxpy(3,y(1),X,1,Y,1) ! argument DA aliased with DY(1) write(*,'(1x,3F6.1)')Y end SUBROUTINE DAXPY(N,DA,DX,INCX,DY,INCY) C Constant times a vector plus a vector. C uses unrolled loops for increments equal to 1. DOUBLE PRECISION DX(1), DY(1), DA INTEGER I,INCX,INCY,IX,IY,M,MP1,N C IF (N.LE.0) RETURN IF (DA.EQ.0.0) RETURN IF (INCX.EQ.1.AND.INCY.EQ.1) GOTO 20 C Code for unequal increments or equal increments C not equal to 1. STOP 'No code given for INCX or INCY not equal to 1' C C Code for both increments equal to 1. 20 M = MOD(N,4) IF (M.EQ.0) GOTO 40 DO I=1,M DY(I) = DY(I) + DA*DX(I) END DO IF (N.LT.4) RETURN 40 MP1 = M + 1 DO I = MP1, N, 4 DY(I) = DY(I) + DA*DX(I) DY(I + 1) = DY(I + 1) + DA*DX(I + 1) DY(I + 2) = DY(I + 2) + DA*DX(I + 2) DY(I + 3) = DY(I + 3) + DA*DX(I + 3) END DO RETURN END [/fxfortran]Here are results using CVF 6.6C:
OPTIONS USED RESULT
/opt:2 /assume:nodummy_aliases 6.0 8.0 10.0 (no alias effect)
/opt:2 6.0 8.0 10.0
/opt:0 6.0 14.0 19.0
/opt:0 /assume:dummy_aliases 6.0 14.0 19.0
and with IFort 12.0.4:
OPTIONS USED RESULT
/Od 6.0 14.0 19.0
/Od /assume:dummy_aliases 6.0 14.0 19.0
/fast 6.0 14.0 19.0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Whether you'll see a difference depends a lot on how complex the code is and whether the optimizer sees a benefit from avoiding memory references. I'll comment that Intel Fortran does interprocedural analysis, so it can see in the example that there is an alias, disabling the optimization. Indeed, it probably does some inlining here as well.
This option has no effect on the caller of a procedure - it's all within the procedure compiled with the option.
I separated the main program from the subroutine. With 12.0.4, I compiled the subroutine with and without /assume:dummy_aliases - the main program was compiled with default options.
DAXPY without /assume:dummy_aliases: 6.0 8.0 10.0
DAXPY with /assume:dummy_aliases: 6.0 14.0 19.0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This reminds me of a non-programming illustration.
I live in dairy farming country. Dairy farms usually have a structure called a silo. This is a tubular shaped building standing about 60 feet tall with a domed cap. Inside you place silage(chopped-up corn stalks and leaves to feed your cows). You dump the silage into the top of the structure and have a auger type device that removes the silage from the bottom. One winter this farmer (a relative of my wife) had problems with the door in the dome at the top of the silo. It wasn't closing and it was causing the silage to freeze in the silo. He asked a repair man to come out an fix the door. He left the repair man to do his work and came out later and not see him he (farmer) climbed up the outside of the silo to look at the work. When he got there he found the repair man was still working on the door while standing on a ladder placed into the silo on top of the silage. The farmer kindly asked the repair man to stop working for a while and to come with him. The both climbed down the outside of the silo and the farmer took the repairman to the access door at the bottom of the silo to show him what was inside the silo. The silo was empty from the ground up 50 feet to a 10 foot deep plug of frozen silage at the top of the silo. At the sight of this the repair man broke out in a cold sweat under the sudden realization that his ladder was standing on not much more than open air.
Now you have to ask yourself:
Will this repair man rely on a "fact" based on fortuitous accidental happenstance?
Fix your code or suffer the consequences.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
DAXPY with /assume:dummy_aliases: 6.0 14.0 19.0
And that is exactly why I am perturbed. The code ran with no perceptible effect of aliasing in the first line, with the default options! The results are the same as if the multiplier a in the y = a.x + y operation had been passed by value, and is the kind of behavior that the programmer would want.
In the second line, the code exhibited precisely the behavior that Les Neilson is trying to avoid by using the /assume:dummy_aliases compiler option!
I have read the Intel Fortran User Guide and see that the explanation that Steve gave is consistent with what is stated in the User Guide. However, what needs to be clarified better is the effect of the /assume:dummy_aliases option. Is it
(i) to assume that the programmer is not aware that arguments may be aliased and to make the routine behave the same as if arguments were copied in and out?
OR
(ii) to assume that the aliasing is intended by the programmer, and to make sure that the aliased variables are updated in memory to keep a coherent picture?
I think that Les wants (i); for the Daxpy example (i) makes more sense, and I had assumed that the compiler option was meant to make that happen. Thanks to Steve, I now think that (ii) is what the compiler does!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Some of the library code was written on clay tablets with pointed sticks - ok I exaggerate a little. But was certainly writtenwhen array operations like A = B + C had to be written with do loops. So we have some code :
[bash] subroutine vadd(v1,v2,v3) implicit none ! Dummy arguments real*8 V1(3) real*8 V2(3) real*8 V3(3) ! Local variables integer*4 I do 100 i = 1 , 3 v3(i) = v1(i) + v2(i) 100 continue return end [/bash]
Now this code is called both as
call vadd(a,b,c)
and
call vadd(p1,p2,p1)
In theory there should be no problem (?) with this sort of code- (there is a VSUB routine also).
I realise that more complicated (e.g. vector product) codewill likelybe problematical.
Finding and modifying the errant code isgoing to keep me busy for some time to come. :-)
Les
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
v3(i)=v1(i)+v2(i)
is aliased to the array element on the left, there should be no need for /assume:dummy_aliases. It is with this in mind that I chose a cross-product for my first example.
Les, will you please look at reply #7 and clarify whether you want behavior (i) or behavior (ii) as described there?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are sometimes comments where they expect the possibility of two dummy arguments being the same
[bash] subroutine multmv(mat,p1,p2) implicit none ! Dummy arguments real*8 MAT(3,3) real*8 P1(3) real*8 P2(3) ! Local variables integer*4 I real*8 P3(3) !--Use p3 to safe-guard against overwrite of p1 when p2 an d p1 are same do 100 i = 1 , 3 p3(i) = mat(1,i)*p1(1) + mat(2,i)*p1(2) + mat(3,i)*p1(3) 100 continue do 200 i = 1 , 3 p2(i) = p3(i) 200 continue return end [/bash]
Les
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately for you, it follows that if /assume:dummy_aliases made a difference to the results, you have to look elsewhere for the problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Further clarification.
The original problem was found by my colleage when running the 64bit version of the exe.
The 32 bit version ran ok.
At first I did not know what the actual routine was that caused the problem. I have just spoken to him about it and found that it is exactly the one I posted -multiply vector by a matrix
(that wasshear good fortune on my part to choose that particular routine as an example)
Since he was getting the wrong answer his guess was that optimisationcould beeliminating the two loops and doing something like :
calculatelocal_p3(1) = mat(1,1)*p1(1) + mat(2,1)*p1(2) + mat(3,1)*p1(3)
p2(1) = local_p3(1) Oops just overwritten p1(1)
calculate local_p3(2) = mat(1,2)*p1(1) ...now using wrong value ofp1(1)
p2(2) = local_p3(2) Overwrite p1(2)
repeat forp2(3)
assume:dummy_aliases apparently fixed the problem
Our boss then asked me to
(a) look at any performance hit if we compile the libraries with dummy_aliases and
(b)identify any other instances where this problem occurs.
Les
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Because of the /O2 optimization level, the loops get unrolled. Intermediate results are kept in the XMM registers. Nothing gets written to memory until just before the RETURN from the subroutine.
Things would have been different had the local array p3 not been provided and used.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Les

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page