Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28489 Discussions

unnecessary temporary created in array assignment

tom_p
Beginner
1,437 Views
Summing two arrays and assigning the sum to a third array seems to trigger the creation of a temporary in ifort 10.1:

var1%arr = var2%arr + var3%arr

All three variables are of the same type and arr is a big allocatable array.

I found this problem because this line produces a stack overflow unless a really big stack size is set. The problem seemingly goes away when using the compiler switch "-heap-arrays", but by taking a look at the memory consumption I can see that still an unneeded temporary is created (but now it is allocated on the heap, so no stack overflow is triggered).

The problem exists in both the linux and windows version of the compiler; it seems to get worse when using OpenMP, probably because each thread wants its own temporaries.

Is there a way to tell the compiler not to create the temporary? This should also be much faster because no result has to be copied from the temporary to the destination variable.

0 Kudos
1 Solution
Steven_L_Intel1
Employee
1,437 Views
When you have an assignment of the form A = A + B, the compiler needs to know if there is overlap between the left and right sides, and if so, to what extent. The code in the compiler to do this takes many cases into account, but one it does not look at right now is if the operands are array components of a derived type. When it sees these, it just shrugs its figurative shoulders and says "I don't know", and generates conservative code.

Elimination of unnecessary temps is an ongoing task and we'll take another look at this area to see what we can improve.

The suggestion given in this thread removes the "derived type component" aspect from the overlap detection equation.

View solution in original post

0 Kudos
19 Replies
Steven_L_Intel1
Employee
1,437 Views
I can't find a compiler option that will eliminate the temp in this case. I'll submit it as a possible optimization improvement to the developers.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,437 Views

As a work-arround add conditional code to compile to call a subroutine to perform the summation (or use DO loop)

subroutine arraySum(inA, inB, outC)
real(8), intent(in):: inA(:,:),inB(:,)
real(8), intent(out) :: outC(:,:)
outC = inA + inB
end subroutine arraySum

use interface in calling places

you might get better execution speed if you pass in extents of arrays

The DO loop thing might be easire if you only have a few instances of this happening

With conditional code you can then quickly test new revisions of the compiler with a flip of a define.

Jim Dempsey
0 Kudos
tom_p
Beginner
1,437 Views
Thank you for your answers. As you suggested, I have rewritten the array assignment using DO-loops. This workaround fixes the problem (no stack overflow even without /heap-arrays) and almost exactly doubles the speed of this section of my code. This is probably because memory bandwith is the limiting factor here.

As the impact is so big, I would very much like to see this being optimized in a newer version of the compiler (ifort 11.0 apparently behaves the same as the older version (10.1)).

However, I did not quite understand why using a subroutine alone would change anything; in fact, the summation is already done in a subroutine which does nothing else but summing the different sub-fields of a derived type. If there indeed is a way of not having to explicitly type the DO-loops, I would be very happy to learn about it, as quite a few of the arrays in question have six dimensions and typing the DO-loops for every one of them is rather tedious. Is there perhaps a way to treat a multi-dimensional array as one-dimensional? (RESHAPE is not a solution, as I need this not only for input but also for output.)

0 Kudos
Steven_L_Intel1
Employee
1,438 Views
When you have an assignment of the form A = A + B, the compiler needs to know if there is overlap between the left and right sides, and if so, to what extent. The code in the compiler to do this takes many cases into account, but one it does not look at right now is if the operands are array components of a derived type. When it sees these, it just shrugs its figurative shoulders and says "I don't know", and generates conservative code.

Elimination of unnecessary temps is an ongoing task and we'll take another look at this area to see what we can improve.

The suggestion given in this thread removes the "derived type component" aspect from the overlap detection equation.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,437 Views

Steve,

I would think if the derived type were (contained)arrays as opposed to pointer to array, that it would be safe to say "cann't possibly overlap).

Jim Dempsey
0 Kudos
Steven_L_Intel1
Employee
1,437 Views
Well, no. Even if they were non-pointer/allocatable arrays, the semantics of A=A+B require that the right side be completely evaluated before the left is assigned to. To avoid creating a temp, you have to make sure that either the references to A on the right overlap the left exactly or not at all.
0 Kudos
OP1
New Contributor II
1,437 Views
Well, no. Even if they were non-pointer/allocatable arrays, the semantics of A=A+B require that the right side be completely evaluated before the left is assigned to. To avoid creating a temp, you have to make sure that either the references to A on the right overlap the left exactly or not at all.

Steve, I think the OP refered to A = B+C, not A = A+B. I use a lot of (large) allocatable arrays inside structures, and I am interested in this optimization of the compiler as well.

Olivier
0 Kudos
Steven_L_Intel1
Employee
1,437 Views
Oops! You're right. Unfortunately, that doesn't help here - the compiler does not even try to figure this out. I have asked that this be improved.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,437 Views

Oliver,

When I had problems with earlier versions of IVF and while waiting for a fix I would "engineer" a solution using the Fortran Preprocessor. In a FPP header file (brought in with a #include "...")

#ifdef _BROKEN_ArraySum_
#define ArraySum(A,B,C) call doArraySum(A,B,C)
#else
#define ArraySum(A,B,C) A = B + C
#endif

Then use the macro in the 100's of places where it mattered. When tentative fix came in the comment out the #define _BROKEN_ArraySum_ and compile then run an integrity/performance test. If the test failed, remove the comment and compile again. FPP makes for easy work.

Jim Dempsey

0 Kudos
jjfait
Beginner
1,437 Views
So if I'm reading this thread correct, a temp variable is always created when assigning one array to another and slower than using do loops like, st1%array=st2%array, is this also true if the bounds are included, st1%array(1:10000)=st2%array(1:10000), thanks
Jeremy
0 Kudos
Steven_L_Intel1
Employee
1,437 Views

I would not say "always". In the case of allocatable or pointer array components of derived type, a temporary may be made. In normal array assignments, no. Specifying the bounds makes it even harder for the compiler - don't do that if you're copying the whole array. It can also change the semantics of the program, especially for allocatable arrays.
0 Kudos
jjfait
Beginner
1,437 Views

I would not say "always". In the case of allocatable or pointer array components of derived type, a temporary may be made. In normal array assignments, no. Specifying the bounds makes it even harder for the compiler - don't do that if you're copying the whole array. It can also change the semantics of the program, especially for allocatable arrays.
The reason i was specifying the indexes was because many times i was receiving a stack overflow error if i didn't and this was one way around it. So I'm getting the impression to always use do-loops. Steve, can you explain how this can change the semantics if copying the whole array or do you mean if only part of the array is copied.
0 Kudos
Steven_L_Intel1
Employee
1,437 Views

See Doctor, It Hurts When I Do This for details on the semantic differences. If you want to use DO loops to do the assignment, I can understand that.
0 Kudos
jjfait
Beginner
1,437 Views
Very interesting article, i never thought about pointing it to an array function. Is there any downside to just calling -heap arrays, and allocating everything on the heap instead of the stack. This seems to get rid of all the stack overflow exceptions.
0 Kudos
Steven_L_Intel1
Employee
1,437 Views
In my view, there is no real downside to heap-arrays. If the arrays are small and the routine does little work, the overhead of allocation and deallocation may be significant, but for most applications it will be noise.
0 Kudos
tom_p
Beginner
1,437 Views
Well, if the array is large it can have quite an impact because the temporary has to be copied back to the target array. In my experience using DO-loops almost doubles the speed of the assignment operation (memory bandwith being the limiting factor). This is irrelevant, of course, if the assignment operation takes a relatively small amount of time and memory footprint is not a problem.

0 Kudos
Steven_L_Intel1
Employee
1,437 Views
I was not discussing overhead of copying - that is always extra - but rather having array temporaries, when they are created, allocated dynamically vs. on the stack.
0 Kudos
Steven_L_Intel1
Employee
1,437 Views
This was fixed in version 12
0 Kudos
tom_p
Beginner
1,437 Views
Thanks! This eliminates quite a lot of DO-loops :)
0 Kudos
Reply