Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

What does stack overflow imply?

mirko_vukovic
Beginner
1,315 Views
Hello,

I writing a Monte-Carlo random walk simulation, and if I do it with a large number of particles (300,000), I get a stack overflow error. I am confused as to what that error may signify.

Here are the gory details:

The error occurs in the routine execute_simulation (in red) in the following module (I include the module header). It occurs during the first loop of that routine. The routine execute_simulation is called from the main program.

module simulation_class
use misc_math_utilities
use particle_cloud_class
implicit none

type simulation
integer :: file_unit
REAL :: DT=1e-6
integer :: cMax_part=10,cMax_steps=10,iStep,seed_arr(2)=(/1,1/)
type (Particle_cloud) ::oParticle_cloud
real, dimension(3)::vRinit,vVinit
integer cPart
logical fPrint_av_KE

end type simulation

interface create_obj
module procedure create_simulation_obj
end interface
interface print_obj_def
module procedure print_simulation_obj_def
end interface

... other routines omitted ...
subroutine execute_simulation(self)
type (simulation) self
integer step
do step=1,self%cMax_steps
call take_step(self%oParticle_cloud)
call print_stats(self%oParticle_cloud,step)
end do

end subroutine execute_simulation

end module

The take_step routine is in the following module

MODULE particle_cloud_class

use misc_math_utilities

type particle_cloud
integer Count
real,allocatable::mStorage(:,:)
end type particle_cloud
integer iPart ! used for looping through particles
integer::viCoords(3)=(/1,2,3/),viVel(3)=(/4,5,6/)

interface create_obj
module procedure create_particle_cloud_obj
end interface
interface print_obj_def
module procedure print_particle_cloud_def
end interface
interface print_stats
module procedure print_R_stats
end interface

... stuff omitted

subroutine take_step(self)
type (particle_cloud)::self
real::temp(3)
self%mStorage(viCoords,:)=self%mStorage(viCoords,:)+self%mStorage(viVel,:)

do iPart=1,self%Count
temp=rn_point_on_sphere()
self%mStorage(viVel,iPart)=temp
end do
end subroutine t ake_step


where rn_point_on_sphere() is a function that returns a 3-element vector of a random point on a sphere. I have tested that generator for up to 10^8 calls, and it seems to work ok.

I am not sure how to go and debug that error. Thanks for any pointers and suggestions.

Mirko


0 Kudos
18 Replies
Steven_L_Intel1
Employee
1,315 Views
See Don't Blow Your Stack. Also, look for the next update of the compiler (real soon now) to include an option to use the heap for temporary arrays, pretty much eliminating the stack overflow issue for most applications.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,315 Views

Mirko,

Perhaps the compiler is creating an unnecessary temporary variable to perform the integration step addition.

Try replacing the array copy with equivelent loop:

subroutine take_step(self)
type (particle_cloud)::self
real::temp(3)

do iPart=1,self%Count
self%mStorage(viCoords,iPart)=self%mStorage(viCoords,iPart)+self%mStorage(viVel,iPart)
end do

do iPart=1,self%Count
temp=rn_point_on_sphere()
self%mStorage(viVel,iPart)=temp
end do
end subroutine take_step

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,315 Views

Additional Information:

If the loop replacement from the prior post corrects the Stack Overflow problem then at some point in the future you may want to consider optimizing the Euler integration. Create a subroutine that redefines the array of XYZ vectors into a rank 1 array of reals. Then perform the array incriment on the rank 1 array of reals. The compiler can then optimize the code for the size of real and if you choose SSE3 optimizations.

Jim

0 Kudos
mirko_vukovic
Beginner
1,315 Views
Thank you.

I will play a bit with setting the stack size, and the suggestions of the two other posters.

Mirko
0 Kudos
mirko_vukovic
Beginner
1,315 Views
Jim,

Your suggested modification did not help.

I should point out that the error is reported where the take_step routine is called.

The caller routine looks like this:

subroutine execute_simulation(self)
type (simulation) self
integer step
do step=1,self%cMax_steps
call take_step(self%oParticle_cloud)
call print_stats(self%oParticle_cloud,step)
end do

end subroutine execute_simulation

and the run-time stack overflow error points to the highlited line.

I have already encountered a similar problem in the same code. There was an error in a routine that calculates a random number (sqrt of a negative number). However, the run-time error was pointing to the statement that calls this routine. I have sent the relevant code the intel support.

Mirko
0 Kudos
Steven_L_Intel1
Employee
1,315 Views
I mentioned earlier that the compiler would soon have an option to allocate temporaries on the heap. The good news is that it does, as of 9.1.029. The bad news is that support for the new switch, /heap-arrays as described in the release notes (which none of you read, apparently...), was inadvertently left out of the command driver in this release. It will get fixed for the next one. In the meantime, you can add this to the command line options:

-Qoption,f,"-heap_arrays 0"

and it will enable the feature.
0 Kudos
mirko_vukovic
Beginner
1,315 Views
MADsblionel:
I mentioned earlier that the compiler would soon have an option to allocate temporaries on the heap. The good news is that it does, as of 9.1.029. The bad news is that support for the new switch, /heap-arrays as described in the release notes (which none of you read, apparently...), was inadvertently left out of the command driver in this release. It will get fixed for the next one. In the meantime, you can add this to the command line options:

-Qoption,f,"-heap_arrays 0"

and it will enable the feature.


I tried both options: increased the stack size, and separately invoked the switch. Both worked. But that should not be news to you -- you expected it to work :-)

Thank you

Mirko
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,315 Views

So the runtime _complaint_ went away but the underlaying problem did not.

The problem is likely unnecessary temporaries being created. These not only consume stack space but creates excessive call overhead.

It would be nice if the compiler had an option to issue an information message when it creates a temporary array. The runtime system can report a warning on some calls. IMHO the warning needs to be issued at compile time too.

Jim Dempsey

0 Kudos
TimP
Honored Contributor III
1,315 Views
Jim, you said
"It would be nice if the compiler had an option to issue an information message when it creates a temporary array."
-check:arg_temp_created has been discussed in the Fortran forum, along with invitations to submit Premier reports of cases where the compiler should recognize the temporary is unnecessary. I've submitted several such, after checking the effect on performance. It's not always evident whether the temporary improves or degrades performance. Real cases from more customers ought to raise the priority on this.
Also, there are cases where the temporary is loop invariant, so it could be created and destroyed outside an inner loop, likely making it beneficial for performance.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,315 Views

Coding changes to recognize unnecessary temps may be a substantial effort. And the payback (to Intel) might not be worth the effort.

Adding a "-check:arg_temp_created" would be relatively easy to do. And the payback to an individual customer might be significant.

There are not only performance issues but some coding issues as well. Assume a temporary is created (when it need not be) and the address is passed on to a function or subroutine. Further assume the temporary was derived from an array which is shared in OpenMP. In this case you have the opportunity to have multiple and different instances of the same data.

It would be nice to have a compiler warning so I could be informed if temporaries are created. This would save me a lot of debugging effort.

Jim Dempsey

0 Kudos
Steven_L_Intel1
Employee
1,315 Views
As Tim said, there IS a "-check:arg_temps_created". This applies to temps created for passing arguments only, not those created for evaluating expressions. For arguments, the compiler generates run-time code to determine if a temp is needed so if you get one in a situation that doesn't require one, that's a big and please report it.

Eliminating temps in assignments is a big performance win, and it is something we are constantly improving. The analysis can sometimes be tricky.
0 Kudos
dbruceg
Beginner
1,315 Views

It's nice to hear you're trying to eliminate unnecessary temps. I've given up using array sectoring on anything except arrays with known and small dimensionsbecause of the stack overflows. Allocation on the heap rather than the stack won't help me: since I do what I can to optimize my own usage of the heap, that will just convert my stack overflows into stack-heap collisions. I've been writingutility subs as necessary to manipulate array sectors as whole arrays when possible and indexing them myself when not.I haven't had any problems since I began doing so.

Bruce

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,315 Views

Steve:

/check:
check run-time conditions
keywords: all (same as /4Yb), none (same as /nocheck, /4Nb),
[no]arg_temp_created, [no]bounds,[no]format,
[no]output_conversion, [no]power, [no]uninit, [no]args

This is a run time check

It would be much nicer to have a compile time check issue an information messagesuch that I can use the IDE to fix the source file(s).

Receiving:

forrtl: warning (402): fort: (1): In call to AVSETVIEWPOINT, an array temporary
was created for argument #2

Helps only if I have a console window and look at the console window. Which may be flipping a whole bunch of other stuff.

In looking at the above message which of my 700+ source modules has the problem call???

Jim Dempsey

0 Kudos
Steven_L_Intel1
Employee
1,315 Views
Jim,

This is a run-time check because the compiled code does a run-time test to see if the argument is contiguous. If it is, then no temp is created and no warning. The accompanying trraceback should identify the location.
0 Kudos
dbruceg
Beginner
1,315 Views

Steve:

I think I missed something important in the discussions over the past few months. Are you saying that

a(:) = b(kf:kl,indx)

will generate stack temps, but

p = Dot_Product(r(jf:jl,indr),s(kf:kl,inds))

will not?

Bruce

0 Kudos
Steven_L_Intel1
Employee
1,315 Views
I'd need an actual program to say for sure. The first statement shouldn't ever need a temp, and if it does, that's something that should be fixed. By the way, instead of a(:) you should simply write a.

For the second, a lot might depend on what the values of indr and inds are and what the declarations of the variables are.
0 Kudos
dbruceg
Beginner
1,315 Views

Steve:

I seem to have missed something big in the discussions about stack temps over the past few months. Are you saying that

a(:) = b(kf:kl,indx)

will generate stack temps, but

p = Dot_Product ( r(jf:jl,indr) , s(kf:kl,inds) )

will not?

Bruce

0 Kudos
Steven_L_Intel1
Employee
1,315 Views
Bruce,

Did you intend to repost your earlier question?
0 Kudos
Reply