- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In a compute-bound pgm, there was a loop that called two functions.
Each of these functions had a large local array declared. The
allocation of these arrays on the stack increased the execution time by
a factor of 10-20. What can be done?
The original code, running under CVF6, used a real array for two distinct purposes. Under certain circumstances, the entire array contained real data as input to the function, and returned modified data. Under different circumstances, the first few elements of the array contained inputs to the function, and were returned unmodified.
So, we have something like
SUBROUTINE MySub(LongData)
REAL, DIMENSION(100000), INTENT(INOUT) :: LongData
REAL, DIMENSION(3) :: ShortData
x=MyFunc(LongData) ! the first case
! OR
x=MyFunc(ShortData) ! the second case
...
REAL FUNCTION MyFunc(RealArray)
REAL, DIMENSION(100000), INTENT(INOUT) :: RealArray
...
The trouble is, this gives a compiler error under IF9, because MyFunc can exceed the dimensions of ShortData. I tried to get around this by creating a new array, LongData2, and copying ShortData to its initial elements. This works, but now the large array LongData2 must be created on the stack each time MySub is called, and MySub is called millions of times.
Even in a fully-optimized release version, allocation of temporary space on the stack is done one page at a time, and that turns out to be about 10 x the execution time of everything else!
My first question is, have the compiler designers already considered this problem, and used a more efficient way of allocating stack space? If it is reserved and committed, the process shouldn't need to check every page, and that probably blows the cache, too.
If not, may I suggest you could save a lot of instructions by allocating stack with a few instructions: a compare to the end of known good space, followed by a move to esp.
The original code, running under CVF6, used a real array for two distinct purposes. Under certain circumstances, the entire array contained real data as input to the function, and returned modified data. Under different circumstances, the first few elements of the array contained inputs to the function, and were returned unmodified.
So, we have something like
SUBROUTINE MySub(LongData)
REAL, DIMENSION(100000), INTENT(INOUT) :: LongData
REAL, DIMENSION(3) :: ShortData
x=MyFunc(LongData) ! the first case
! OR
x=MyFunc(ShortData) ! the second case
...
REAL FUNCTION MyFunc(RealArray)
REAL, DIMENSION(100000), INTENT(INOUT) :: RealArray
...
The trouble is, this gives a compiler error under IF9, because MyFunc can exceed the dimensions of ShortData. I tried to get around this by creating a new array, LongData2, and copying ShortData to its initial elements. This works, but now the large array LongData2 must be created on the stack each time MySub is called, and MySub is called millions of times.
Even in a fully-optimized release version, allocation of temporary space on the stack is done one page at a time, and that turns out to be about 10 x the execution time of everything else!
My first question is, have the compiler designers already considered this problem, and used a more efficient way of allocating stack space? If it is reserved and committed, the process shouldn't need to check every page, and that probably blows the cache, too.
If not, may I suggest you could save a lot of instructions by allocating stack with a few instructions: a compare to the end of known good space, followed by a move to esp.
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The compiled code has no way of knowing if the stack space is committed. The only way to reliably give a stack overflow error if the stack does overflow is to do repeated checks on pages.
Have you considered declaring RealArray in the subroutine as assumed size (*)? Or you can make it "adjustable" with the bound passed as an argument, or assumed-shape (:). All of these would avoid the need for a local copy.
Have you considered declaring RealArray in the subroutine as assumed size (*)? Or you can make it "adjustable" with the bound passed as an argument, or assumed-shape (:). All of these would avoid the need for a local copy.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"The compiled code has no way of knowing if the stack space is committed." There must be a Windows API to determine this.
However, you could just assume the stack allocation was valid. Then, if it turned out not to be, you would get the stack overflow error when you tried to use the "allocated" stack space. Granted, this wouldn't be as convenient, but a clever error handler would keep track of which stack memory was untested.
I will consider your other suggestions for the next time I have this situation. For now I just added another argument to the subroutine call.
However, you could just assume the stack allocation was valid. Then, if it turned out not to be, you would get the stack overflow error when you tried to use the "allocated" stack space. Granted, this wouldn't be as convenient, but a clever error handler would keep track of which stack memory was untested.
I will consider your other suggestions for the next time I have this situation. For now I just added another argument to the subroutine call.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, you wouldn't get a stack error. You'd get either an access violation or perhaps other odd behavior if you were now in some dynamically allocated address space. There are some guard pages at the bottom of the reserved stack area but in order to detect stack overflow, you can't blindly do the subtract and hope for the best. Instead, you have to repeatedly subtract a bit (less than the size of the guard area), test, repeat. There is no other reliable way to detect this problem.
Another solution for you in the next update is to make the local array ALLOCATABLE and allocate it to the desired size. But avoiding the copy seems a better approach to me.
Another solution for you in the next update is to make the local array ALLOCATABLE and allocate it to the desired size. But avoiding the copy seems a better approach to me.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I agree that you'd get an AV. However, your handler could see that the
AV was in reading/writing space which the process thought was valid
stack, and give the same error as would have occurred from _chkstk
failure.
If the stack were the lowest virtual address (like PDP-11), and you did the subtract, checking the final address only would suffice, since lower addresses would not be part of the process's address space. In this method, you would also have to check for wraparound, but this is just looking at the carry bit after subtraction. No guard area needed.
Another approach is, at the start of Fortran runtime, to determine the limits of committed stack, by using _chkstk (slightly modified). Then you know that this limit legal, and the stack can be changed to that value without checking. If some dynamic stack allocation were done, you might have to redo the limit check.
If the stack were the lowest virtual address (like PDP-11), and you did the subtract, checking the final address only would suffice, since lower addresses would not be part of the process's address space. In this method, you would also have to check for wraparound, but this is just looking at the carry bit after subtraction. No guard area needed.
Another approach is, at the start of Fortran runtime, to determine the limits of committed stack, by using _chkstk (slightly modified). Then you know that this limit legal, and the stack can be changed to that value without checking. If some dynamic stack allocation were done, you might have to redo the limit check.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page