- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am having a problem with the size of the stack that is allocated for the program and I am wondering if there are any new developments that may be I could take advantage of.
I am currently using a fixed stack of 48 MB and could increase it but I wander if there is a way to have the OS dynamically decide this.
Also are there any debugging utilities that I can use to figure out where the stack leaks occur in the code.
Thanks a lot for any advice!!!!!!!!!!!!!!!!
I am currently using a fixed stack of 48 MB and could increase it but I wander if there is a way to have the OS dynamically decide this.
Also are there any debugging utilities that I can use to figure out where the stack leaks occur in the code.
Thanks a lot for any advice!!!!!!!!!!!!!!!!
Link Copied
7 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try adding the option /heap-arrays. In Visual Studio, this is Optimization > Heap Arrays > 0.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the advice.
Are there any disadvantages or do I have anything to consider when I put all arrays on the heap rather than the stack?
Markus
Are there any disadvantages or do I have anything to consider when I put all arrays on the heap rather than the stack?
Markus
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's usually considered desirable to use the option to leave arrays up to a certain size on stack and put larger ones on heap, as the stack allocation should be faster when you have frequent dynamic allocation and deallocation of small arrays. You could set it to match the default used by another compiler which you might use.
I'd like more expert discussion of the implications for openmp. I've always avoided heap allocation of private arrays.
I'd like more expert discussion of the implications for openmp. I've always avoided heap allocation of private arrays.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately, the size option on /heap-arrays is pretty much useless - it affects only automatic arrays with compile-time known sizes, an unusual occurrence.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>I'd like more expert discussion of the implications for openmp. I've always avoided heap allocation of private arrays
Assume you are on Linux and use the linker option to specify "unlimited stack". The main thread starts and with its stack pointer set as high as it can go (or as high as the implementator thought was prudent). The Virtual Memory page containing that address (and potentially some additional pages below it) is initially mapped to page file and (potentially)physical RAM. The main thread can consume stack until it page faults when pushing into un-mapped virtual memory, then the O/S and potentially the C runtime library (room permitted) assign page file page(s) and map to physical RAM (potentially including swap), and program resumes.
Now then OpenMP starts its threads or user starts pthreads or ... starts threads each with "unlimited stack". What does this mean? To the experienced programmer... all threads, nor even the2nd thread,cannot follow the same technique. Something has to give.
The original main thread would have had available, from the highest possible (prudent) address down to used/inaccessible address of the application. Meaning no available addresses for additional threads.
Therefor a comprimise must be made. My assumption is, the 2nd thread takes 1/2 the main thread's stack (probably the lower 1/2addresses). The 3rd thread takes 1/2 the larger of the two previously reserved stacks (probably the lower addresses of that stack). The4th thread takes 1/2 the larger of the three previously reserved stack (probably the lower addresses of that stack). etc...
This partitioning scheme may be appropriate or not. On a 32-bit system, this does not necessarily offer you the best solution since the per thread stack requirements may differ. On 64-bit systems this likely will not be a problem due to larger virtual address space (48 or more bits).
Jim Dempsey
Assume you are on Linux and use the linker option to specify "unlimited stack". The main thread starts and with its stack pointer set as high as it can go (or as high as the implementator thought was prudent). The Virtual Memory page containing that address (and potentially some additional pages below it) is initially mapped to page file and (potentially)physical RAM. The main thread can consume stack until it page faults when pushing into un-mapped virtual memory, then the O/S and potentially the C runtime library (room permitted) assign page file page(s) and map to physical RAM (potentially including swap), and program resumes.
Now then OpenMP starts its threads or user starts pthreads or ... starts threads each with "unlimited stack". What does this mean? To the experienced programmer... all threads, nor even the2nd thread,cannot follow the same technique. Something has to give.
The original main thread would have had available, from the highest possible (prudent) address down to used/inaccessible address of the application. Meaning no available addresses for additional threads.
Therefor a comprimise must be made. My assumption is, the 2nd thread takes 1/2 the main thread's stack (probably the lower 1/2addresses). The 3rd thread takes 1/2 the larger of the two previously reserved stacks (probably the lower addresses of that stack). The4th thread takes 1/2 the larger of the three previously reserved stack (probably the lower addresses of that stack). etc...
This partitioning scheme may be appropriate or not. On a 32-bit system, this does not necessarily offer you the best solution since the per thread stack requirements may differ. On 64-bit systems this likely will not be a problem due to larger virtual address space (48 or more bits).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The /heap-arrays option is indeed quite useless. It increases the computational time by a factor of 2.
Any other ideas?
Any tricks to see what blows out the stack?
Any other ideas?
Any tricks to see what blows out the stack?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tiho,
Yes, there is a trick
1) create a user defined data type. This type contains the allocatable arrays to be used by various subroutines and functions.
2) instantiate a threadprivate copy of this data type and set whatever flags your want to indicate "not initialized"
Assume you have an array A used by subroutine FOO
module TLS_MODULE
type FOOtemps
real, allocatable:: A(:)
real, allocatable:: B(:)
...
end type FOOtemps
type FEEtemps
real, allocatable:: X(:)
real, allocatable:: Y(:)
...
end type FEEtemps
...
type TLStemps
type(FOOtemps) :: FOO
type(FEEtemps) :: FEE
...
end type TLStemps
type(TLStemps) :: TLS
common /TLScontext/ TLS
!$OMP THREADPRIVATE(/TLScontext/)
...
end module TLS_MODULE
...
SUBROUTINE FOO(...)
USE TLS_MODULE
...
SizeOfScratchA = YouDetermineSizeForScratchA
IF(.NOT. ALLOCATED(TLS%FOO%A)) THEN
ALLOCATE(TLS%FOO%A(SizeOfScratchA ))
ELSE
IF(SIZE(TLS%FOO%A) .LT. SizeOfScratchA) THEN
DEALLOCATE(TLS%FOO%A)
ALLOCATE(TLS%FOO%A(SizeOfScratchA))
ENDIF
ENDIF
The above code can be placed inside the CONTAINS section of TLS_MOD
SUBROUTINE FOO(...)
USE TLS_MODULE
...
SizeOfScratch = YouDetermineSizeForScratch
call GetScratchFOO(SizeOfScratch )
! above call returns immediately if SizeOfScratch sufficient, else it reallocates
...
*** Note
Some of the earlier versions of IVF required pointers to the allocatables as opposed to allocatables.
If this is your situation, then you will have to adjust the code accordingly
The additional programming problem you may have is if a function or subroutine is actually called recursively (as opposed to being named RECURSIVE for OpenMP purposes). If this is the case then you may need to have a TLS stack of arrays and then subscript to the appropriate array orset a pointer to the appropriate array.
This is some hoop jumping, but it is faster than allocate/deallocate. A snip of some code I did
Jim Dempsey
Yes, there is a trick
1) create a user defined data type. This type contains the allocatable arrays to be used by various subroutines and functions.
2) instantiate a threadprivate copy of this data type and set whatever flags your want to indicate "not initialized"
Assume you have an array A used by subroutine FOO
module TLS_MODULE
type FOOtemps
real, allocatable:: A(:)
real, allocatable:: B(:)
...
end type FOOtemps
type FEEtemps
real, allocatable:: X(:)
real, allocatable:: Y(:)
...
end type FEEtemps
...
type TLStemps
type(FOOtemps) :: FOO
type(FEEtemps) :: FEE
...
end type TLStemps
type(TLStemps) :: TLS
common /TLScontext/ TLS
!$OMP THREADPRIVATE(/TLScontext/)
...
end module TLS_MODULE
...
SUBROUTINE FOO(...)
USE TLS_MODULE
...
SizeOfScratchA = YouDetermineSizeForScratchA
IF(.NOT. ALLOCATED(TLS%FOO%A)) THEN
ALLOCATE(TLS%FOO%A(SizeOfScratchA ))
ELSE
IF(SIZE(TLS%FOO%A) .LT. SizeOfScratchA) THEN
DEALLOCATE(TLS%FOO%A)
ALLOCATE(TLS%FOO%A(SizeOfScratchA))
ENDIF
ENDIF
The above code can be placed inside the CONTAINS section of TLS_MOD
SUBROUTINE FOO(...)
USE TLS_MODULE
...
SizeOfScratch = YouDetermineSizeForScratch
call GetScratchFOO(SizeOfScratch )
! above call returns immediately if SizeOfScratch sufficient, else it reallocates
...
*** Note
Some of the earlier versions of IVF required pointers to the allocatables as opposed to allocatables.
If this is your situation, then you will have to adjust the code accordingly
The additional programming problem you may have is if a function or subroutine is actually called recursively (as opposed to being named RECURSIVE for OpenMP purposes). If this is the case then you may need to have a TLS stack of arrays and then subscript to the appropriate array orset a pointer to the appropriate array.
This is some hoop jumping, but it is faster than allocate/deallocate. A snip of some code I did
type(TypeTNSXET), pointer :: pTNSXET
! see MOD_ALL, MOD_SCRATCH and MOD_SCRATCHcode for scratch pad memory allocation
! Coordinate edits with scratch memory in above files
pTNSXET => ScratchTNSXET(pTether)
Jim Dempsey

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page