Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Unusual stack overflow

Roman1
New Contributor I
840 Views

Hi,

I am using Intel Visual Fortran Compiler XE 12.1.2.278.

When I compile and run the following program, I am getting a stack overflow error.
I can fix this by making the stack bigger, by using the heap, or changing to a DO loop.
However, I don't understand why a simple assignment statement would want to use the stack in the first place.


Roman


[fortran] module my_type_mod type my_type integer,allocatable:: z(:) end type my_type end module my_type_mod !----------------------------------------------- module sub1_mod contains subroutine sub1( A, B ) use my_type_mod, only: my_type implicit none type(my_type),intent(in) :: A type(my_type),intent(out):: B integer n, i n = size( A%z ) allocate( B%z(n) ) ! The following line causes a stack overflow error. B%z = A%z ! Uncomment the following and comment the previous line ! to fix the stack overflow error. ! do i = 1, n ! B%z(i) = A%z(i) ! end do return end subroutine sub1 end module sub1_mod !----------------------------------------------- program test_stack use my_type_mod, only: my_type use sub1_mod, only: sub1 implicit none type(my_type):: A, B integer n n = 1000000 allocate( A%z(n) ) A%z = 1 call sub1( A, B ) write(*,*) sum(B%z) stop end program test_stack [/fortran]
0 Kudos
7 Replies
Steven_L_Intel1
Employee
840 Views
I believe this is on our "to do" list. The compiler does not recognize that whole allocatable array components are contiguous and don't overlap, so it creates a temporary copy of the right side of the assignment. I will nudge the developers again about this.
0 Kudos
jimdempseyatthecove
Honored Contributor III
840 Views
In this case the z component was declared with "allocatable" and without target(iow not pointer to section of array). Therefore the compiler could directly make the correctdecision about using a temporary or not.

In most other cases where it is unknown at compile time, the compiler writers should be aware that their users (also programmers) are aware of a potential for the need of runtime temporary array allocations... and will avoid coding in a manner that requires a temporary. For these programmers, inserting code (and the associated overhead)to test for overlap and take appropriate code path with/without use of temporary, is preferrable to always generating code using a temp (under questionable circumstances).

Jim Dempsey
0 Kudos
IanH
Honored Contributor III
840 Views
In most other cases where it is unknown at compile time, the compiler writers should be aware that their users (also programmers) are aware of a potential for the need of runtime temporary array allocations... and will avoid coding in a manner that requires a temporary. For these programmers, inserting code (and the associated overhead)to test for overlap and take appropriate code path with/without use of temporary, is preferrable to always generating code using a temp (under questionable circumstances).

The compiler failing to take into account overlap where overlap was at all possible (and legal) would be a pretty serious bug. In my book, this would render the compiler all but unusable for any code with POINTERs.

Based on questions seen on this forum and similar, there is a signficant proportion of Fortran programmers who wouldn't be aware of the potential for overlap with variables with the pointer/target attribute and the consequences of that in terms of the requirement for temporaries. I suspect that of the remainder, there would be a very significant proportion that wish for the compiler to do the right thing before it tries to do the fast thing.

Programmers that are worried about performance/stack space consumption etc, (perhaps when they observe problems during a run, such as in this post) can trivially code around any compiler generated temporaries, whether they are necessary or not.
0 Kudos
Steven_L_Intel1
Employee
840 Views
The Intel compiler assumes there is overlap unless it can prove there isn't. Over the years we've needed to refine and add tests to detect more cases where we know there is no overlap, such as this one. The developers know about it.

/heap-arrays is a workaround, but if performance is important, avoiding the temp entirely is preferable.
0 Kudos
jimdempseyatthecove
Honored Contributor III
840 Views
IanH,

The compiler can test for overlap, in the cited example the two arguments were known by the compiler to be ALLOCATABLE and thus could not have overlaped. I think the compiler goofed the check for allocatable due to the z array being a component of a user defined type. Though the programmer's use of A%z and B%z could concievably be a reference to thethe same object z=z of the same allocated array should be harmless (though costly in unnecessary CPU cycles).

In the situations where the compiler cannot assure non-overlap, then the compiler has two choices
a) Always use temporary (not preferred by me)
b) Insert code to determine b.1) overlaps, b.2) non-overlapping, b.3) unknown.
In the case of b.1 or b.3 use the temp, in the case of b.2 no temp.

Note, the compiler writer can choose the degree to which the non-overlap test works (failure results in b.3 unknown).

The most simple of test can be

if(Stride.A == 1)
if(Stride.B == 1)
if(loc(A(lbound)) .lt. loc(B(lbound)))
if(loc(A(ubound)) .lt. loc(B(lbound)))
return non-overlap
else
return overlap
endif
endif
if(loc(B(lbound)) .lt. loc(A(lbound)))
if(loc(B(ubound)) .lt. loc(A(lbound)))
return non-overlap
else
return overlap
endif
endif
// here if A==B
decide if to remove unnecessary code
endif
endif
return unknown

The above test could be donewith less than30 instructions.

Please consider adding a compiler option to indicate if you want the test overhead (to avoid copy overhead) or no test overhead (generate copy code when compiler cannot make the determination).

I suspect that there is some benchmark program that always requires a temporary and in this case performance is increased by not having the test. (default could then be no runtime test for overlap).

What is the compiler writer's opinion of if the cache designer decided the test for data in cache is extra work so we won't make this test? Well creating an unnecessary temporary can cause 10's, 100's, 1000's, ... of unnecessary cache misses, flushes, and not to mention the occasional Stack Overflow.

Jim Dempsey
0 Kudos
IanH
Honored Contributor III
840 Views
Ok. I thought you were advocating for a c) the compiler writers should assume that things with the POINTER/TARGET attribute will never overlap (don't bother testing and don't bother with a temporary) because Fortran programmers are smart.

(I initially read your response as saying that "these (smart) programmers" should be "inserting code", not the compiler - I follow your point now.)
0 Kudos
Steven_L_Intel1
Employee
840 Views
It turns out that this problem occurs only when dummy arguments are involved. We have fixed that for a future release (not the one coming up.)
0 Kudos
Reply