While this forum has numerous entries on the topic of stack overflow, none seem to match the situation I am encountering. I am developing a multi-language Windows application (32-bit) consisting of a C# GUI and several dll's of Fortran procedures that provide a variety of features under the umbrella of the GUI. The development environment consists of Visual Studio 2015 and the IVF 18.0 compiler. The operating system is Windows 10 on a 64-bit computer.
One of the features initiates an iterative process that takes an unknown amount of time to complete. The feature is initiated with a call from the GUI to a subroutine in one of the dll's. Because of the time involved, it is run in it's own thread using System.Threading. Once started, the subroutine runs to completion, periodically spawning files of information that is used to update the display of the GUI. The called subroutine was originally the main program of a standalone application with 22 additional modules. The main program was converted to a subroutine contained within its own module for purposes of creating the dll. Upon execution, the thread terminates with a Stack Overflow exception at the call to a subroutine that is embedded within one of the other modules. In the debugger, the exception is raised immediately upon stepping into the subroutine, suggesting that the problem is with the subroutine arguments. The call to the subroutine appears as follows:
call IntervalHams(OptHam, P, .FALSE.)
where P is a double precision scalar and OptHam is an object of a derived type that consists of several double precision scalar variables, followed by several KIND=4 integer and logical scalar variables, and ending with an explicit array of dimension 1860 of another derived type record of scalars totaling 64 bytes. OptHam is of fixed length of 119,264 bytes.
The numerous entries on this forum regarding stack overflow invariably point to instances of automatic or allocated arrays and suggest increasing the stack reserve size or specifying a heap size to force automatic and allocated arrays to be allocated on the heap rather than the stack. I have increased the stack size to ten times the size of the OptHam record, which had no effect. Also, explicitly specifying the heap size has no effect, which was expected since there are no automatic or allocated arrays. Does anyone know of any other possible reason for the exception in this case?
In which build are you increasing the stack size?: It has to be in the executable build, not the DLL.
If setting Optimization > Heap Arrays to 0 doesn't help, it could be that you have infinite recursion, or the thread stack is just too small.
Thanks for the comments, Steve. Of course, the executable is the C# program. According to the VS documentation, you can change the stack size in the declaration of the thread, but they recommend not doing so, saying that the size has been optimized and the problem is likely caused by a programming error, such as infinite recursion. However, I do not use recursion and the error occurs on the very first call to the subroutine. I think the problem is something else.
"optimized" - heh.
I would analyze this by stepping into the routine by instruction, though doing this from managed code might be tricky. Can you set a breakpoint on the routine and reach it without the overflow?
I switched the Fortran application from a standalone executable to a dll so that I could transition in the VS debug mode from the C# code to the Fortran code. By stepping instruction by instruction in the Fortran code to the call to the routine, I can normally step directly into that subroutine. When I try that in this case, the exception is raised immediately so that I am unable to get to the first executable line in the subroutine.
This suggests to me an error in how you have the Fortran procedure declared in C#. I am not familiar with C# so I can't advise you on that. You might start with calling a dummy routine with no arguments, and if that works, gradually add arguments from your real routine to see which one might be the problem.
The problem with this suggestion is that I am not calling the offending routine from C#. C# calls a main subroutine and, from that, other subroutines are called in a chain of events. The offending call is far down on this list. I have been checking further and find the offending call is actually the second call to the subroutine. The first is successfully done in an initializing routine that is located in the same module as the offending routine. The problem occurs next at a call from a routine in a different module. Does that suggest a possible solution?
When you have a symptom like that, I worry about STDCALL vs. C calling convention mismatches (you said it was 32-bit). It may not be this routine that is the issue, that's just where it shows up. Stack corruption is also a distinct possibility - again, something that happened earlier in the program.
What I tended to do when presented with something like this is selectively disable parts of the code leading up to the problem call, and see if the behavior changes. This can be a tedious process but I usually had success with it.
If this is largely Fortran code, try turning on the /check:stack option.
As suggested, Steve, I have continued the search for the cause of the Stack Overflow exception. The only indicator that a problem exists is the call to subroutine IntervalHams that results in the exception occurring upon entrance. Because a first call to the subroutine does not raise the event, I started adding calls to the subroutine at various points in the code subsequent to the first call to find the explicit point where the exception is first raised. I found that point to be the entrance to another subroutine named Step. Calling IntervalHams immediately before the call to Step produces no error; however, if IntervalHams is called as the first executable statement inside Step, the exception is raised at the call. The argument list of Step consists of a double precision scalar variable and a logical scalar variable. Subroutine Step is located in a module Propagate that is different from that of the routine making the call to Step. This appears to be the first time any entity in Propagate is accessed. The declarative portion of Propagate consists only of a fixed length double precision array and two double precision scalar variables. Nothing in all this seems particularly remarkable and I am out of ideas for how to proceed further. The check:stack run-time attribute has been set since the project was created. Any more ideas?
Have you verified that you USE the module where Step is contained in the procedures that call Step?
IOW, if you are .NOT. using IMPLICIT NONE, then Step may appear as an array.
Jim, I include an IMPLICIT NONE statement immediately following the use statements in all modules. Also, the procedure that calls Step includes in its declarative part a USE statement for Propagate.
On the call IntervalHams can you, in the debugger, inspect the entire OptHams object?
In particular, I am asking if the OptHams object was constructed by C# and passed in by reference.
The C# program acts to collect input data for the Fortran code. When the input data collection is complete, the data are written in a text file of fixed name and path in Namelist format. The Fortran code is then launched with a call to the appropriate routine in the DLL. The Fortran program gets all its information from the Namelist input file. The only information passed as an argument is the pathname of the Fortran working directory. Keep in mind, this code has worked for years as a standalone executable program, first compiled with the LF95 compiler, which doesn't work on Windows 10, and then on the PGI compiler, which generates only 64-bit code. I just moved to the Intel compiler because I need to provide a 32-bit application to customers. I found the transition to IVF to be somewhat onerous in that numerous errors were flagged by IVF while compiling successfully with the other compilers. It's entirely possible that a change made to the code to resolve the compiler errors created the situation that now exists.
To prove that the exception event has nothing to do with the interface between C# and the Fortran code, I re-cast the DLL application as a standalone executable program and launched it outside the C# environment. The Stack Overflow exception again occurred at the same location as with the DLL formulation.
Some things to look at:
Your main program is not a Fortran program. Therefore, you may have overlooked calling for_rtl_init at start (stick this in the DLL Load event handler), and for_rtl_finish (stick this in the DLL Unload event handler). This assumes all the Fortran code is in the DLL. You can alternatively make the calls from the C# program. These subroutines are documented in the IVF user manual.
Do not USE DFPORT from an older runtime library, USE the Intel supplied portability modules (IFPORT, KERNEL32, IFWINTY, IFWIN, IFCORE, etc...)
And use INTEGER(HANDLE) for Windows handle variables (defined in IFWINTY, and USE'd by IFWIN).
With this post, I attempt to summarize what has been learned about the Stack Overflow exception described in previous posts above:
1. The error is triggered in an internal (in the compiler) assembly language routine named chkstk.asm, which seems to be called as part of the initialization process when the subroutine IntervalHams is called.
2. The exception event has nothing to do with the C#-Fortran interface. The error occurs whether or not one launches the Fortran application from the C# user interface.
2. The event occurs whether the Fortran application is created as a DLL or a standalone application using the IVF compiler.
3. Using the same source code that compiled without error with the IVF compiler, the run-time error is not raised in a standalone application compiled with the PGI Fortran compiler.
4. Collectively, these facts lead me to conclude that there is a serious problem with the IVF compiler. At a minimum, improved diagnostics are needed to identify more specifically what it thinks is wrong and where it initially manifests itself.
>>The error is triggered in an internal (in the compiler) assembly language routine named chkstk.asm, which seems to be called as part of the initialization process when the subroutine IntervalHams is called.
In this case, place the break point on the call to the subroutine, when at break, open a disassembly window, then step in (in the disassembly) in the call into the subroutine. At the point of the call to chkstk, determine the size of the stack allocation, then step into chkstk, and determine the stack limit.
Once you have these two numbers, either:
a) The stack allocation request is absurdly large (not what you wanted)
b) The available stack space is much less than what you expected.
I suspect b), the C# thread that calls the Fortran DLL has too small of a stack.
Jim, as I said, the exception is raised when the Fortran is run as a standalone application. That is, it is not run as a thread in the C# program. It is a Fortran executable program entirely separate from a C# environment. I do not know assembly language programming, but I include below the disassembled code of the chkstk procedure. I suspect you will understand what the code is doing.
Upon entry to chkstk, instructions are processed linearly to the statement label cs10. At the second statement after this label, execution jumps to label cs20 and the three statements that follow, resulting in the jump back to cs10. This is a loop that is executed successfully 40 times. On the 41st pass, the StackOverflow exception is raised on the second statement after the label cs20 (ie., the test statement). I think this means that the register eax is reduced by pagesize (4096 bytes) forty times (a total of 163,840 bytes) before the exception occurs. The stack reserve and commit sizes in the linker property pages were left with the default values of zero. Does this provide you with useful information?
page ,132 title chkstk - C stack checking routine ;*** ;chkstk.asm - C stack checking routine ; ; Copyright (c) Microsoft Corporation. All rights reserved. ; ;Purpose: ; Provides support for automatic stack checking in C procedures ; when stack checking is enabled. ; ;******************************************************************************* .xlist include vcruntime.inc .list ; size of a page of memory _PAGESIZE_ equ 1000h CODESEG page ;*** ;_chkstk - check stack upon procedure entry ; ;Purpose: ; Provide stack checking on procedure entry. Method is to simply probe ; each page of memory required for the stack in descending order. This ; causes the necessary pages of memory to be allocated via the guard ; page scheme, if possible. In the event of failure, the OS raises the ; _XCPT_UNABLE_TO_GROW_STACK exception. ; ; NOTE: Currently, the (EAX < _PAGESIZE_) code path falls through ; to the "lastpage" label of the (EAX >= _PAGESIZE_) code path. This ; is small; a minor speed optimization would be to special case ; this up top. This would avoid the painful save/restore of ; ecx and would shorten the code path by 4-6 instructions. ; ;Entry: ; EAX = size of local frame ; ;Exit: ; ESP = new stackframe, if successful ; ;Uses: ; EAX ; ;Exceptions: ; _XCPT_GUARD_PAGE_VIOLATION - May be raised on a page probe. NEVER TRAP ; THIS!!!! It is used by the OS to grow the ; stack on demand. ; _XCPT_UNABLE_TO_GROW_STACK - The stack cannot be grown. More precisely, ; the attempt by the OS memory manager to ; allocate another guard page in response ; to a _XCPT_GUARD_PAGE_VIOLATION has ; failed. ; ;******************************************************************************* public _alloca_probe _chkstk proc _alloca_probe = _chkstk push ecx ; Calculate new TOS. lea ecx, [esp] + 8 - 4 ; TOS before entering function + size for ret value sub ecx, eax ; new TOS ; Handle allocation size that results in wraparound. ; Wraparound will result in StackOverflow exception. sbb eax, eax ; 0 if CF==0, ~0 if CF==1 not eax ; ~0 if TOS did not wrapped around, 0 otherwise and ecx, eax ; set to 0 if wraparound mov eax, esp ; current TOS and eax, not ( _PAGESIZE_ - 1) ; Round down to current page boundary cs10: cmp ecx, eax ; Is new TOS bnd jb short cs20 ; in probed page? mov eax, ecx ; yes. pop ecx xchg esp, eax ; update esp mov eax, dword ptr [eax] ; get return address mov dword ptr [esp], eax ; and put it at new TOS bnd ret ; Find next lower page and probe cs20: sub eax, _PAGESIZE_ ; decrease by PAGESIZE test dword ptr [eax],eax ; probe page. jmp short cs10 _chkstk endp end
The chkstack routine is actually part of the MSVC library and is called by the (Fortran or C) compiler-generated code when it wants to allocate an object on the stack. So it is getting into the routine, but some aspect of your code (that you have not shown) is causing a stack temp to be created on entry. An array with the Fortran VALUE attribute (not ATTRIBUTES VALUE), when the routine does not have BIND(C), could do this, as could an "automatic array" (local array whose bounds are based in part on a dummy argument, COMMON or module variable).
Nearly all the code of this application was developed on a Fortran compiler that pre-dated the Fortran standards subsequent to 1995. Any changes made with the modern compilers were necessitated because of compiler specific requirements. No 2003 or later features have been used in developing the code, so there are no VALUE attributes used. Again, the code is 100% Fortran - no mixed language programming is used in the source code. So, of the reasons you cite as a possible cause for calling chkstk, we are left only with automatic arrays. There are no automatic arrays declared or used in IntervalHams, in the subroutine that calls IntervalHams, or even in the module containing the calling procedure. Is it reasonable to think there is a need to call the chkstk routine because of a declaration more remote than this? Is it possible the call is being made unnecessarily or in error?
At this point I am unwilling to speculate further without seeing the code. At a minimum I need the source to the routine where the error occurs and any INCLUDE or module files it needs to compile. If you'd like, I can send you a Private Message and you can make the source available to me that way. Alternatively you can open a ticket with Intel and give them the sources necessary to reproduce the issue.
>>The stack reserve and commit sizes in the linker property pages were left with the default values of zero. Does this provide you with useful information?
The default stack size is 2MB (depending on O/S). At the point of the call you had.
(currently allocated stack ??)
(failure at 41 * 4,096 = 1,679,936 additional bytes)
IOW if up to the call statement you consumed ~320KB of stack, you will StackOverflow.
Note, due to you not obtaining the additional stack size requested at entry (eax), we do not know the size requested. i.e. how many more 4096 pages it will require.
Link the main portion of your program with sufficient stack space (Stack Reserve size in the Linker System property page).