stack

tiho · ‎10-20-2010

I am having a problem with the size of the stack that is allocated for the program and I am wondering if there are any new developments that may be I could take advantage of.

I am currently using a fixed stack of 48 MB and could increase it but I wander if there is a way to have the OS dynamically decide this.

Also are there any debugging utilities that I can use to figure out where the stack leaks occur in the code.

Thanks a lot for any advice!!!!!!!!!!!!!!!!

Steven_L_Intel1 · ‎10-20-2010

Try adding the option /heap-arrays. In Visual Studio, this is Optimization > Heap Arrays > 0.

onkelhotte · ‎10-20-2010

Thanks for the advice.

Are there any disadvantages or do I have anything to consider when I put all arrays on the heap rather than the stack?

Markus

TimP · ‎10-21-2010

It's usually considered desirable to use the option to leave arrays up to a certain size on stack and put larger ones on heap, as the stack allocation should be faster when you have frequent dynamic allocation and deallocation of small arrays. You could set it to match the default used by another compiler which you might use.
I'd like more expert discussion of the implications for openmp. I've always avoided heap allocation of private arrays.

Steven_L_Intel1 · ‎10-21-2010

Unfortunately, the size option on /heap-arrays is pretty much useless - it affects only automatic arrays with compile-time known sizes, an unusual occurrence.

jimdempseyatthecove · ‎10-21-2010

>>I'd like more expert discussion of the implications for openmp. I've always avoided heap allocation of private arrays

Assume you are on Linux and use the linker option to specify "unlimited stack". The main thread starts and with its stack pointer set as high as it can go (or as high as the implementator thought was prudent). The Virtual Memory page containing that address (and potentially some additional pages below it) is initially mapped to page file and (potentially)physical RAM. The main thread can consume stack until it page faults when pushing into un-mapped virtual memory, then the O/S and potentially the C runtime library (room permitted) assign page file page(s) and map to physical RAM (potentially including swap), and program resumes.

Now then OpenMP starts its threads or user starts pthreads or ... starts threads each with "unlimited stack". What does this mean? To the experienced programmer... all threads, nor even the2nd thread,cannot follow the same technique. Something has to give.

The original main thread would have had available, from the highest possible (prudent) address down to used/inaccessible address of the application. Meaning no available addresses for additional threads.

Therefor a comprimise must be made. My assumption is, the 2nd thread takes 1/2 the main thread's stack (probably the lower 1/2addresses). The 3rd thread takes 1/2 the larger of the two previously reserved stacks (probably the lower addresses of that stack). The4th thread takes 1/2 the larger of the three previously reserved stack (probably the lower addresses of that stack). etc...

This partitioning scheme may be appropriate or not. On a 32-bit system, this does not necessarily offer you the best solution since the per thread stack requirements may differ. On 64-bit systems this likely will not be a problem due to larger virtual address space (48 or more bits).

Jim Dempsey

tiho · ‎10-21-2010

The /heap-arrays option is indeed quite useless. It increases the computational time by a factor of 2.

Any other ideas?

Any tricks to see what blows out the stack?

jimdempseyatthecove · ‎10-21-2010

Tiho,

Yes, there is a trick

1) create a user defined data type. This type contains the allocatable arrays to be used by various subroutines and functions.

2) instantiate a threadprivate copy of this data type and set whatever flags your want to indicate "not initialized"

Assume you have an array A used by subroutine FOO

module TLS_MODULE
type FOOtemps
real, allocatable:: A(:)
real, allocatable:: B(:)
...
end type FOOtemps

type FEEtemps
real, allocatable:: X(:)
real, allocatable:: Y(:)
...
end type FEEtemps
...
type TLStemps
type(FOOtemps) :: FOO
type(FEEtemps) :: FEE
...
end type TLStemps

type(TLStemps) :: TLS
common /TLScontext/ TLS
!$OMP THREADPRIVATE(/TLScontext/)

...
end module TLS_MODULE
...

SUBROUTINE FOO(...)
USE TLS_MODULE
...
SizeOfScratchA = YouDetermineSizeForScratchA

IF(.NOT. ALLOCATED(TLS%FOO%A)) THEN
ALLOCATE(TLS%FOO%A(SizeOfScratchA ))
ELSE
IF(SIZE(TLS%FOO%A) .LT. SizeOfScratchA) THEN
DEALLOCATE(TLS%FOO%A)
ALLOCATE(TLS%FOO%A(SizeOfScratchA))
ENDIF
ENDIF

The above code can be placed inside the CONTAINS section of TLS_MOD

SUBROUTINE FOO(...)
USE TLS_MODULE
...
SizeOfScratch = YouDetermineSizeForScratch
call GetScratchFOO(SizeOfScratch )
! above call returns immediately if SizeOfScratch sufficient, else it reallocates
...

*** Note

Some of the earlier versions of IVF required pointers to the allocatables as opposed to allocatables.
If this is your situation, then you will have to adjust the code accordingly

The additional programming problem you may have is if a function or subroutine is actually called recursively (as opposed to being named RECURSIVE for OpenMP purposes). If this is the case then you may need to have a TLS stack of arrays and then subscript to the appropriate array orset a pointer to the appropriate array.

This is some hoop jumping, but it is faster than allocate/deallocate. A snip of some code I did

type(TypeTNSXET), pointer :: pTNSXET

! see MOD_ALL, MOD_SCRATCH and MOD_SCRATCHcode for scratch pad memory allocation

! Coordinate edits with scratch memory in above files

pTNSXET => ScratchTNSXET(pTether)

Jim Dempsey