- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Summing two arrays and assigning the sum to a third array seems to trigger the creation of a temporary in ifort 10.1:
var1%arr = var2%arr + var3%arr
All three variables are of the same type and arr is a big allocatable array.
I found this problem because this line produces a stack overflow unless a really big stack size is set. The problem seemingly goes away when using the compiler switch "-heap-arrays", but by taking a look at the memory consumption I can see that still an unneeded temporary is created (but now it is allocated on the heap, so no stack overflow is triggered).
The problem exists in both the linux and windows version of the compiler; it seems to get worse when using OpenMP, probably because each thread wants its own temporaries.
Is there a way to tell the compiler not to create the temporary? This should also be much faster because no result has to be copied from the temporary to the destination variable.
var1%arr = var2%arr + var3%arr
All three variables are of the same type and arr is a big allocatable array.
I found this problem because this line produces a stack overflow unless a really big stack size is set. The problem seemingly goes away when using the compiler switch "-heap-arrays", but by taking a look at the memory consumption I can see that still an unneeded temporary is created (but now it is allocated on the heap, so no stack overflow is triggered).
The problem exists in both the linux and windows version of the compiler; it seems to get worse when using OpenMP, probably because each thread wants its own temporaries.
Is there a way to tell the compiler not to create the temporary? This should also be much faster because no result has to be copied from the temporary to the destination variable.
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you have an assignment of the form A = A + B, the compiler needs to know if there is overlap between the left and right sides, and if so, to what extent. The code in the compiler to do this takes many cases into account, but one it does not look at right now is if the operands are array components of a derived type. When it sees these, it just shrugs its figurative shoulders and says "I don't know", and generates conservative code.
Elimination of unnecessary temps is an ongoing task and we'll take another look at this area to see what we can improve.
The suggestion given in this thread removes the "derived type component" aspect from the overlap detection equation.
Elimination of unnecessary temps is an ongoing task and we'll take another look at this area to see what we can improve.
The suggestion given in this thread removes the "derived type component" aspect from the overlap detection equation.
Link Copied
19 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can't find a compiler option that will eliminate the temp in this case. I'll submit it as a possible optimization improvement to the developers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As a work-arround add conditional code to compile to call a subroutine to perform the summation (or use DO loop)
subroutine arraySum(inA, inB, outC)
real(8), intent(in):: inA(:,:),inB(:,)
real(8), intent(out) :: outC(:,:)
outC = inA + inB
end subroutine arraySum
use interface in calling places
you might get better execution speed if you pass in extents of arrays
The DO loop thing might be easire if you only have a few instances of this happening
With conditional code you can then quickly test new revisions of the compiler with a flip of a define.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your answers. As you suggested, I have rewritten the array assignment using DO-loops. This workaround fixes the problem (no stack overflow even without /heap-arrays) and almost exactly doubles the speed of this section of my code. This is probably because memory bandwith is the limiting factor here.
As the impact is so big, I would very much like to see this being optimized in a newer version of the compiler (ifort 11.0 apparently behaves the same as the older version (10.1)).
However, I did not quite understand why using a subroutine alone would change anything; in fact, the summation is already done in a subroutine which does nothing else but summing the different sub-fields of a derived type. If there indeed is a way of not having to explicitly type the DO-loops, I would be very happy to learn about it, as quite a few of the arrays in question have six dimensions and typing the DO-loops for every one of them is rather tedious. Is there perhaps a way to treat a multi-dimensional array as one-dimensional? (RESHAPE is not a solution, as I need this not only for input but also for output.)
As the impact is so big, I would very much like to see this being optimized in a newer version of the compiler (ifort 11.0 apparently behaves the same as the older version (10.1)).
However, I did not quite understand why using a subroutine alone would change anything; in fact, the summation is already done in a subroutine which does nothing else but summing the different sub-fields of a derived type. If there indeed is a way of not having to explicitly type the DO-loops, I would be very happy to learn about it, as quite a few of the arrays in question have six dimensions and typing the DO-loops for every one of them is rather tedious. Is there perhaps a way to treat a multi-dimensional array as one-dimensional? (RESHAPE is not a solution, as I need this not only for input but also for output.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you have an assignment of the form A = A + B, the compiler needs to know if there is overlap between the left and right sides, and if so, to what extent. The code in the compiler to do this takes many cases into account, but one it does not look at right now is if the operands are array components of a derived type. When it sees these, it just shrugs its figurative shoulders and says "I don't know", and generates conservative code.
Elimination of unnecessary temps is an ongoing task and we'll take another look at this area to see what we can improve.
The suggestion given in this thread removes the "derived type component" aspect from the overlap detection equation.
Elimination of unnecessary temps is an ongoing task and we'll take another look at this area to see what we can improve.
The suggestion given in this thread removes the "derived type component" aspect from the overlap detection equation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve,
I would think if the derived type were (contained)arrays as opposed to pointer to array, that it would be safe to say "cann't possibly overlap).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, no. Even if they were non-pointer/allocatable arrays, the semantics of A=A+B require that the right side be completely evaluated before the left is assigned to. To avoid creating a temp, you have to make sure that either the references to A on the right overlap the left exactly or not at all.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Steve Lionel (Intel)
Well, no. Even if they were non-pointer/allocatable arrays, the semantics of A=A+B require that the right side be completely evaluated before the left is assigned to. To avoid creating a temp, you have to make sure that either the references to A on the right overlap the left exactly or not at all.
Steve, I think the OP refered to A = B+C, not A = A+B. I use a lot of (large) allocatable arrays inside structures, and I am interested in this optimization of the compiler as well.
Olivier
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oops! You're right. Unfortunately, that doesn't help here - the compiler does not even try to figure this out. I have asked that this be improved.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oliver,
When I had problems with earlier versions of IVF and while waiting for a fix I would "engineer" a solution using the Fortran Preprocessor. In a FPP header file (brought in with a #include "...")
#ifdef _BROKEN_ArraySum_
#define ArraySum(A,B,C) call doArraySum(A,B,C)
#else
#define ArraySum(A,B,C) A = B + C
#endif
Then use the macro in the 100's of places where it mattered. When tentative fix came in the comment out the #define _BROKEN_ArraySum_ and compile then run an integrity/performance test. If the test failed, remove the comment and compile again. FPP makes for easy work.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So if I'm reading this thread correct, a temp variable is always created when assigning one array to another and slower than using do loops like, st1%array=st2%array, is this also true if the bounds are included, st1%array(1:10000)=st2%array(1:10000), thanks
Jeremy
Jeremy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would not say "always". In the case of allocatable or pointer array components of derived type, a temporary may be made. In normal array assignments, no. Specifying the bounds makes it even harder for the compiler - don't do that if you're copying the whole array. It can also change the semantics of the program, especially for allocatable arrays.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Steve Lionel (Intel)
I would not say "always". In the case of allocatable or pointer array components of derived type, a temporary may be made. In normal array assignments, no. Specifying the bounds makes it even harder for the compiler - don't do that if you're copying the whole array. It can also change the semantics of the program, especially for allocatable arrays.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
See Doctor, It Hurts When I Do This for details on the semantic differences. If you want to use DO loops to do the assignment, I can understand that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Very interesting article, i never thought about pointing it to an array function. Is there any downside to just calling -heap arrays, and allocating everything on the heap instead of the stack. This seems to get rid of all the stack overflow exceptions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my view, there is no real downside to heap-arrays. If the arrays are small and the routine does little work, the overhead of allocation and deallocation may be significant, but for most applications it will be noise.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, if the array is large it can have quite an impact because the temporary has to be copied back to the target array. In my experience using DO-loops almost doubles the speed of the assignment operation (memory bandwith being the limiting factor). This is irrelevant, of course, if the assignment operation takes a relatively small amount of time and memory footprint is not a problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was not discussing overhead of copying - that is always extra - but rather having array temporaries, when they are created, allocated dynamically vs. on the stack.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This was fixed in version 12
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks! This eliminates quite a lot of DO-loops :)
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page