- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't know why the following program throw overflow, since all the large arrays are dynamic, this is just an example, y = sum(u) works, but why partial sum does not work?
program test implicit none integer, parameter :: n = 1000000, m = 10 real(8), allocatable, dimension(:,:) :: x real(8) :: y allocate(x(n,m)) x = 1.0d0 call tt(x,y,n,m) print *, y end subroutine tt(x,y,n,m) implicit none integer, intent(in) :: n, m real(8), intent(in), dimension(n,m) :: x real(8), intent(out) :: y real(8), allocatable, dimension(:) :: z real(8), allocatable, dimension(:,:) :: u allocate(z(n),u(n,m)) u = exp(x) z = sum(u,2) y = sum(z) return end
Many thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is what the documentation (link given in #2) says:
Option standard-realloc-lhs (the default), tells the compiler that when the left-hand side of an assignment is an allocatable object, it should be reallocated to the shape of the right-hand side of the assignment before the assignment occurs. This is the current Fortran Standard definition. This feature may cause extra overhead at run time. This option has the same effect as option assume realloc_lhs.
I think that what happens is that a temporary array is allocated on the stack to hold the array-valued expression to the right of the '=' in the assignment statement. Under F2008 rules, the variable in question is deallocated, reallocated with the correct size (it is irrelevant whether or not the previous size was already correct), the temporary array is copied to the newly allocated variable, and the temporary array is marked for possible deletion during a subsequent garbage collection.
Only the compiler authors can tell us the details, and users discouraged from asking for such details (because a revision of the compiler can make the response invalid).
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
See the description of the effects of using the /assume:norealloc_lhs option at https://software.intel.com/en-us/node/678232 .
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mecej4 wrote:
See the description of the effects of using the /assume:norealloc_lhs option at https://software.intel.com/en-us/node/678232 .
It works when compile with /assume:norealloc_lhs, but since z was allocated with correct shape and size, why it throws stack overflow, could you show me the details?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is what the documentation (link given in #2) says:
Option standard-realloc-lhs (the default), tells the compiler that when the left-hand side of an assignment is an allocatable object, it should be reallocated to the shape of the right-hand side of the assignment before the assignment occurs. This is the current Fortran Standard definition. This feature may cause extra overhead at run time. This option has the same effect as option assume realloc_lhs.
I think that what happens is that a temporary array is allocated on the stack to hold the array-valued expression to the right of the '=' in the assignment statement. Under F2008 rules, the variable in question is deallocated, reallocated with the correct size (it is irrelevant whether or not the previous size was already correct), the temporary array is copied to the newly allocated variable, and the temporary array is marked for possible deletion during a subsequent garbage collection.
Only the compiler authors can tell us the details, and users discouraged from asking for such details (because a revision of the compiler can make the response invalid).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Mecj4 is correct; the compiler does, in fact, create a temp for the call at line 21. And, it creates the temp on the stack by default. You can override the "create on the stack" behavior by using the /heap-arrays command line switch.
--Lorri
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>it creates the temp on the stack by default
IMHO this is a deficiency in the compiler. When the output array has the correct shape for the output of SUM(array,dim), then no temporary should be created at all. The same should hold true for conforming array expressions. IOW when it is not necessary to perform a reallocation, the deallocation & allocation should be bypassed. Performing the deallocation & allocation in this case is an unnecessary trip through a critical section (twice). In a multi-threaded application this potentially creates a bottleneck.
I suggest that your compiler optimization team construct a multi-threaded test (say on 256-thread system) whereby the reallocation of left hand side is performed.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I noticed that if you make array Z fixed shape the problem doesn't occur, but if you try
z(:) = sum(u,2)
stack overflow still happens on ifort 16.0. To me, this behavior is an issue because assigning to an array section should turn off the effects of /assume:realloc_lhs for this assignment statement, so no temporary array should be necessary.
I don't see where the Fortran standard requires deallocation of the variable. It does of course say that if the variable and its subobjects don't all have the same dynamic type and shape as the expression, deallocation must occur. Does this wording in the standard mean that if a complete dynamic type and shape match occurs, no deallocation should happen? I tried also looking at the section on pointer association and it didn't say that any pointers whose target was the allocatable variable before the assignment would have undefined association status in the case where deallocation is not necessary. It didn't say that such pointers remained associated in this case, either.
You can't necessarily tell until you have evaluated the expression whether deallocation might be necessary. Consider the case where the expression has a reference to a function with an allocatable result or a reference to a transformational intrinsic with shape determined by expressions that must be evaluated at runtime. This makes it harder for the compiler to always use the storage space of the variable to build the result of the expression.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In spite of the issues noted in my last post, I found a syntax that avoids a temporary array:
program test implicit none integer, parameter :: n = 1000000, m = 10 real(8), allocatable, dimension(:,:) :: x real(8) :: y allocate(x(n,m)) x = 1.0d0 call tt(x,y,n,m) print *, y end subroutine tt(x,y,n,m) implicit none integer, intent(in) :: n, m real(8), intent(in), dimension(n,m) :: x real(8), intent(out) :: y real(8), allocatable, dimension(:) :: z real(8), allocatable, dimension(:,:) :: u allocate(z(n),u(n,m)) u = exp(x) ! z = sum(u,2) call ImLike(z,u,n,m) y = sum(z) return contains subroutine ImLike(a,b,i,j) integer i, j real(kind(z)) a(i) real(kind(u)) b(i,j) a = sum(b,2) return end subroutine ImLike end
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Compiling with "/nostandard-realloc-lhs" works, but when combine with the /Qopenmp option, it throws stackoverflow again
mecej4 wrote:
This is what the documentation (link given in #2) says:
Option standard-realloc-lhs (the default), tells the compiler that when the left-hand side of an assignment is an allocatable object, it should be reallocated to the shape of the right-hand side of the assignment before the assignment occurs. This is the current Fortran Standard definition. This feature may cause extra overhead at run time. This option has the same effect as option assume realloc_lhs.
I think that what happens is that a temporary array is allocated on the stack to hold the array-valued expression to the right of the '=' in the assignment statement. Under F2008 rules, the variable in question is deallocated, reallocated with the correct size (it is irrelevant whether or not the previous size was already correct), the temporary array is copied to the newly allocated variable, and the temporary array is marked for possible deletion during a subsequent garbage collection.
Only the compiler authors can tell us the details, and users discouraged from asking for such details (because a revision of the compiler can make the response invalid).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
/nostandard-realloc-lhs has no effect whatsoever on stack usage. All it does is disable a Fortran 2003 feature to do automatic (re)allocation of allocatable arrays in an assignment, assuming that you have already allocated it properly. It has no effect on use of temporaries and just adds a bit of performance to apps that don't need the reallocation check.
OpenMP imposes its own demands on stack usage, not all of which can be helped with /heap-arrays (though that does help some.) You may need to play with both stack reserve size and the OMP_STACKSIZE environment variable if you encounter stack overflows in OpenMP applications.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
y = SUM (u) also works, without the need for "z"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
sure, this is just an example, to show something like sum(x,2)
John Campbell wrote:
y = SUM (u) also works, without the need for "z"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As a more serious question: would the ifort default response to the following code example be to create temporary copies of u and z ?
subroutine tt(x,y,n,m) implicit none integer, intent(in) :: n, m real(8), intent(in), dimension(n,m) :: x real(8), intent(out) :: y real(8), allocatable, dimension(:) :: z real(8), allocatable, dimension(:,:) :: u allocate (u(n,m)) u = exp(x) allocate (z(n)) z = sum(u,2) y = sum(z) return end
If it would, this is indicating a problem with any F90 approach to use of ALLOCATE.
I would have hoped that if the last action for the array was an ALLOCATE, then creating a temporary copy should not be considered. It is not a big leap to also consider if there is no change to the size of the array.
Or am I misunderstanding this thread?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To carry John's question one step further:
Consider if the answer to if "u = exp(x)" does indeed not only create a temporary and performs a reallocation of left hand side, what happens then with:
allocate(u(1-1234:n-1234,1-5678:m-5678)) u = exp(x)
Then what are the bounds of the subscripts???
The excess of all good things is mischievous
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The compiler, theoretically, could generate a separate code path that avoids a temp, but I am fairly certain ifort doesn't do that.
As for Jim's question, the result of exp(x) always has 1 as the lower bound for each dimension - same as for any array expression. The shape (rank and extents) match the argument. This is how it must be in the light of the argument possibly being an array section or having vector subscripts.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve,
My concern is not that the result of exp (prior to =) has subscripts origin'd at 1, but rather that due to unnecessary rls the resultant array gets re-origined. In older code this did not happen. While my example subscripts may have been silly, the coder may reasonably desire to use 0-based indexing.
allocate(u(0:n-1,0:m-1)) u = exp(x)
IMHO too many of your compiler engineering optimization strategies are based on an assumption that these allocations are stack based (absent of critical section) as opposed to heap based (with critical section). It looks like they may have taken a shortcut and repurposed MOVE_ALLOC(exp(x), u).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim, would you please show a complete example with output that illustrates your point? I'm not getting it.
The standard says that in the case you show, the result of exp(x) has a lower bound of 1 for each dimension. However, in the assignment to u (in your example), if the SHAPE of u matches the SHAPE of exp(x), u does not get reallocated and whatever lower bounds it had before remain the same. Keep in mind that SHAPE is rank (number of dimensions) and extent (number of elements) - the lower bound doesn't enter into it. There is no MOVE_ALLOC done - it's a copy of data. A sufficiently clever compiler could store the exp values directly into the result (after reallocation if required) rather than creating a temp first and then doing an array copy. I don't know if ifort is there yet.
However, if u has a SHAPE different from exp(x), then it will get reallocated with 1 as the lower bound for each dimension.
You're correct that the default is to use the stack for temps, because it's faster. You're also correct that this is problematic for large temps, which is why I have promoted the use of /heap-arrays for years, and argued in favor of making that the default.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve,
The compiler is actually doing
subroutine tt(x,y,n,m) implicit none integer, intent(in) :: n, m real(8), intent(in), dimension(n,m) :: x real(8), intent(out) :: y real(8), allocatable, dimension(:,:) :: u allocate (u(0:n-1,0:m-1)) print *,lbound(u,1),lbound(u,2) u = exp(x) print *,lbound(u,1),lbound(u,2) deallocate(u) u = exp(x) print *,lbound(u,1),lbound(u,2) deallocate(u) allocate (u(0:m-1,0:n-1)) ! non-matching sizes u = exp(x) print *,lbound(u,1),lbound(u,2) return end program rls_issue implicit none integer, parameter :: n=11 integer, parameter :: m=22 real(8) :: x(n,m), y call RANDOM_NUMBER(x) call tt(x,y,n,m) print *,y end program rls_issue Output: 0 0 0 0 1 1 1 1 -9.255963134931783E+061
what I would believe is correct given the circumstances:
Reallocation (and re-indexbasing) did not occur when rls not required.
While when reallocating in the last case it did not maintain the 0 based index, I had no expectation that it should (in that case).
I haven't tested V18, haven't installed yet, there appears to be issues with unallocated arrays.
Jim Dempsey
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Version 18 gives the same result for the bounds. Your subroutine never assigns to y, so its value is undefined.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does ifort Ver2018 use the stack for any of the following examples ?
I would consider most stack overflow errors to be a compiler bug, or at least not a smart compiler.
subroutine jim (x,y,n,m) implicit none integer, intent(in) :: n, m real(8), intent(in), dimension(n,m) :: x real(8), intent(out) :: y real(8), allocatable, dimension(:,:) :: u print *,'jim' allocate (u(0:n-1,0:m-1)) print *,lbound(u,1),lbound(u,2),ubound(u,1),ubound(u,2) u = exp(x) print *,lbound(u,1),lbound(u,2),ubound(u,1),ubound(u,2) deallocate(u) u = exp(x) print *,lbound(u,1),lbound(u,2),ubound(u,1),ubound(u,2) deallocate(u) allocate (u(0:m-1,0:n-1)) ! non-matching sizes u = exp(x) print *,lbound(u,1),lbound(u,2),ubound(u,1),ubound(u,2) y = sum (u) return end subroutine jim subroutine john (x,y,n,m) implicit none integer, intent(in) :: n, m real(8), intent(in), dimension(n,m) :: x real(8), intent(out) :: y real(8), allocatable, dimension(:) :: z real(8), allocatable, dimension(:,:) :: u print *,'john' allocate (u(n,m)) u = exp(x) print *,lbound(u,1),lbound(u,2),ubound(u,1),ubound(u,2) allocate (z(n)) z = sum(u,2) print *,lbound(z,1),ubound(z,1) y = sum(z) return end subroutine john subroutine default (x,y,n,m) implicit none integer, intent(in) :: n, m real(8), intent(in), dimension(n,m) :: x real(8), intent(out) :: y real(8), allocatable, dimension(:) :: z real(8), allocatable, dimension(:,:) :: u print *,'default' u = exp(x) print *,lbound(u,1),lbound(u,2),ubound(u,1),ubound(u,2) z = sum(u,2) print *,lbound(z,1),ubound(z,1) y = sum(z) return end subroutine default program rls_issue implicit none integer, parameter :: n=11000 integer, parameter :: m=2200 real(8) :: x(n,m), y call RANDOM_NUMBER(x) call jim (x,y,n,m) print *,y call john (x,y,n,m) print *,y call default (x,y,n,m) print *,y end program rls_issue !Output: ! jim ! 0 0 10999 2199 ! 0 0 10999 2199 ! 1 1 11000 2200 ! 1 1 11000 2200 ! 41579331.634002581 ! john ! 1 1 11000 2200 ! 1 11000 ! 41579331.634005338 ! default ! 1 1 11000 2200 ! 1 11000 ! 41579331.634005338
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The only stack temp usage I spotted was for the assignments of sum(u,2). Everything else seemed to be done "in place".

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page