Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29277 Discussions

Suggestion #1: Large Automatic Variables on the Heap

Jugoslav_Dujic
Valued Contributor II
662 Views
I'd like to share few thoughts and offer them as suggestions for future implementation in Intel Fortran; I'll separate them into two threads as they're unrelated.
The first is inspired by this recent thread on comp.lang.fortran. (Unfortunately, Google merged two distinct threads, the latter part being mostly about allocatable and automatic arrays). To summarize, the original poster needed a scratch working array of size unknown in advance, but feared about efficiency on frequent (de)allocation. A suggestion has been made that automatic arrays are the best solution, but the discussion pinpointed their limits, the most severe being limited stack size on most machines (1MB by default for CVF/IVF) and lack of error handling therein.
One suggestion that emerged is that automatic variables need not necessarily be implemented on the stack, but may go to the heap as well. This partly defeats their purpose of fast allocation/deallocation, but adds safety. Similar discussionis applicableof stack temporaries involved usually in array operations -- IVF already has a switch for that (which slows down the thing but adds safety).
My suggestion is -- why would user have to take care about it? Available stack size and current state of stack pointer (ESP) are already available for the compiler -- it can check whether the stack variable (automatic array or a temporary) fits into available stack and, if so, allocate it there; otherwise, it can allocate it on the heap. Theformer methodis fast, thelatter is safe.
(The"reverse" consideration may even be applied to small ALLOCATABLE non-SAVEd local arrays -- they can be allocated on the stack instead of heap).
My -- admittedly superficial -- impression that such feature would not be hard to implement, as any stack allocation would have just to be preceded with a check of available stack size (which should be fast), whereas if it fails, a branch with heap allocation (call to for_alloc_allocatable) should be taken.
All comments appreciated.
Jugoslav
0 Kudos
3 Replies
Steven_L_Intel1
Employee
662 Views

This is something we've had in mind for a long time, and is already on our "wish list". It's not easy to detect how much stack you have left - you have to touch the memory and see if you get an access violation.

A simpler scheme is to choose to allocate things larger than some threshold on the heap, with the threshold value specifiable by the user (as a compile option.) We actually implemented this on OpenVMS Alpha but the underpinnings to do this aren't there on the Intel side, so we'd have some additional implementation work.

0 Kudos
Jugoslav_Dujic
Valued Contributor II
662 Views

Huh... I was about to say that the address of stack limit is constant per process and thus can be evaluated only once in advance, however, then I realized that it's a per-thread value, and a compiler doesn't have a crystal ball to know in what thread context the code will be executed.

I'm not versatile in assembly, but I assume that nothing sensible can be deduced solely by the value of ESP? (Btw, does it grow or decline on x86 with the depth of call stack?)

Indeed, your secondscheme seems easier for realization. I hope we'll see it in foreseeable future.

Jugoslav

0 Kudos
Steven_L_Intel1
Employee
662 Views
The value of ESP decreases as the stack is used.
0 Kudos
Reply