Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Problem with stack overflow

andreas_kissavosequa
423 Views
Hi,
we have a program that has, up to now, worked fine with other compilers.
We recently switched to the Intel Visual fortran and are ironing out "bugs" in the code.
This recently appeared in a subroutine:
      SUBROUTINE sort_col_row(colv, rowv, indx, N, task)
*
* Sort from smallest to largest after col # and within col after row #
* using Heap sort,
* order in indx, i.e. index(1) points to lowest column number
*   indx(new #) = orig #
* If task = 1 (used from PREF_SUP)
*   colv(orig #) = new #
      INTEGER N, task
      INTEGER colv(N), rowv(N), indx(N)
      INTEGER i, i0, c1, ci, col
      INTEGER, DIMENSION(:), ALLOCATABLE :: work, rowwork
      allocate(work(N))
      allocate(rowwork(N))
* First some sorting is done, resulting in a vector of indices work.
C     indx = indx(work)  ! stack overflow on IFORT
      rowwork = indx(work)
      indx = rowwork
Before, we could then perform the above, now commented out, operation indx = indx(work). Now, however, this results in a stack overflow for large cases, since the indx = indx(work) seems to allocate the needed extra structure on the stack. We have for now worked around by instead using another intermediate structure explicitly as in the code above, but  we want to know what the best way to solve this would be, using as few vectors/matricec as possible, and not getting a stack overflow.
Best regards,
Andreas 
0 Kudos
5 Replies
jimdempseyatthecove
Honored Contributor III
423 Views
Andreas,

Under Optimizations, endable Heap Arrays

Jim Dempsey
0 Kudos
Steven_L_Intel1
Employee
423 Views
... and the way you do that in Visual Studio is to set the Heap Arrays property to 0.
0 Kudos
jimdempseyatthecove
Honored Contributor III
423 Views
Steve,

I know the answer to this is a marketing question....

One only needs to read through the posts on ISN (google: "stack overflow" fortran site:software.intel.com) to find that this is a significant problem for new users of Intel Visual Fortran. With this in mind, why isn't the default setup to be stack conservative? This will eliminate these types of initial problems for new users. Later, when users desire higher performance they can experiment with options that increase the Heap Arrays property value from 0 to n. As it stands now you have:

    Program (slightly) faster, with possibility that program won't run.

versus

    Program (slightly) slower, with low probability that program won't run.

As it stands now, your marketing driven decision for speed trumps is analogous to a gunslinger opting for a hair trigger with a good probability of shooting himself in the foot.

----------------

For future versions, why not consider defaulting to a "guided" allocation strategy. It should be known at thread startup (main as well as omp) as to what size each thread has for stack size. At subroutine/function entry point, a reasonable guestamate can be made as to the additional stack requirements. (temporary arrays could be worked into this as well.) The "guestamate" and stack remaining can be used to guide the allocation strategy. Consider option:

    /heap-arrays:guided

Jim Dempsey
0 Kudos
Steven_L_Intel1
Employee
423 Views
Jim, you won't get any argument from me on this - I've lobbied for making heap arrays the default for a while now. 

Unfortunately, on all the platforms we support, by the time the program starts it's too late to make a decision on where to allocate temporaries.  But I will forward your suggestion and perhaps some ideas might come from it.
0 Kudos
jimdempseyatthecove
Honored Contributor III
423 Views
>>Unfortunately, on all the platforms we support, by the time the program starts it's too late to make a decision on where to allocate temporaries.  But I will forward your suggestion and perhaps some ideas might come from it.

Thanks for forwarding the request.

Notes to pass on to developers (not marketing).

I believe on all OpenMP platforms you support Thread Local Storage. On single threaded program this would be in static data. On thread startup (and main) a stack watermark can be produced and stored in TLS.

Your current heap arrays has a size value that is use to select where an allocation is obtained (stack/heap) based on size of allocation. For known sizes the determination can be made at compile time. For unknown sizes a test must be performed at runtime. The

    /heap-arrays:guided       (equivilent to /heap-arrays:HUGE:guided)
or
   /heap-arrays:nnnn:guided

When allocation is larger than nnnn (HUGE) allocate from heap, else subtract the allocation size from the stack pointer and compare it to the TLS watermark. When the result is beneath the watermark then allocate from heap, else allocate from stack.

You could also consider:

    /heap-arrays:llll-hhhh:guided

Beneath llll allocate on stack
Above hhhh allocate on heap
In range of llll to hhhh use guided  (note the format llll-hhhh could assume ":guided")

You may (will) also want a means (option) to specify the watermark (stack reserve size).

Jim Dempsey
0 Kudos
Reply