Re: stack overflow

jd_weeks · ‎05-03-2006

I have successfully compiled a static library from some Fortran source and I have linked it with my C/C++ application, all in Visual Studio.net. The code even works!

But I tried a somewhat larger problem on the code and it failed with a stack overflow upon calling into the Fortran code. I presume this is because one of the first things it does is to copy some arrays into local arrays (allocated on the stack?). The code stops in an assembly-language routine chkstk.asm.

I tried to find some information about this in the Intel Fortran documentation. I found something that said I should increase the stack space, and to refer to the compiler release notes to find out how. The Intel Fortran 9 release notes having nothing that I can find on the topic.

Is that, in fact, what I need to do?

How do I do it?

I am using an evaluation copy of Intel Visual Fortran, 9.0.2945.2003.

In Visual Studio.Net, under the Fortran properties, External Procedures section, Calling Convention is set to C,Reference, Name Case Interpretation is Lower Case, Append Underscore is set to Yes, Generate Interface Blocks is set to No.

Thanks!!!

Steven_L_Intel1 · ‎05-03-2006

In Visual Studio, set the project property Linker..System..Stack Reserve.

See also this article.

jd_weeks · ‎05-04-2006

Thanks, Steve. I set the stack space to 4 MBytes and now my test case works. This is a very large and complex application (my company's main product, Igor Pro see www.wavemetrics.com if you're interested). Having to raise the stack allocation for a test case that is really rather modest brings up other questions for me:

Is there some documentation somewhere of what gets allocated on the stack, and how much space this takes?

The failure in my case occurred as I was calling a subroutine that I didn't write from a subroutine that I did write (the purpose of which is to provide a nice C-compatible interface for a subroutine that uses assumed-size arrays).

The arrays passed through the interface subroutine may be large, but they are passed by reference. I would have thought that would involve allocating just a pointer-sized piece of memory on the stack for each array.

Oh, and your link: I'm not sure what I'm supposed to be getting from an article about ShellExecute?

Again, thank you for your help!!!

Steven_L_Intel1 · ‎05-04-2006

The link is to a specific article further down the page. I suspect you did not let the page finish loading.

What goes on the stack? By default, scalar local variables, any variable declared as AUTOMATIC, all non-initialized, non-allocatable/pointer and non-SAVEd variables in RECURSIVE routines. Also, whenever the compiler thinks it needs to make a temporary copy of an array, it uses the stack - this latter is what usually causes problems.

We have work underwaay to allow the compiler to allocate temp arrays on the heap.

jd_weeks · ‎05-04-2006

Ah, you're right- the artical about the stack is down below. I think we were having some problems yesterday with our internet connection.

In the artical, it says, "Replace automatic arrays with allocatable arrays and ALLOCATE them to the desired size at the start of the routine (they will be automatically deallocated on routine exit unless marked SAVE.)"

That means that a statement like this:

REAL (KIND=R8) LDELTA(:,:),LLOWER(NP),LWE(N,NQ,NQ),LWD(N,M,M)

will allocate an array LWE with N*NQ*NQ*8 bytes and LWD with N*M*M*8 bytes both on the stack?

And does the compiler compile the allocations such that chkstk will raise the stack overflow exception on entry into the routine where this declaration occurs? N, M, NP, and NQ are dummy parameters passed into the subroutine; the declaration shown here is declaring local variables.

Steven_L_Intel1 · ‎05-04-2006

Yes, LWE and LWD would be allocated on the stack as they are "automatic arrays". You could write instead:

REAL(KIND=R8), ALLOCATABLE :: LWD(:,:,:), LWE(:,:,:)

...

ALLOCATE (LWE(N,NQ,NQ),LWD(N,M,M))

I think the compiler does try to check the stack on entry.

Message Edited by Steve_Lionel on 05-04-2006 02:14 PM

jd_weeks · ‎05-04-2006

OK- I'm making some real progress. Thanks, Steve.

I apologize- my questions are getting longer as I learn more :)

I replaced local arrays with dynamic arrays.

I read somewhere in your writings that matrix assignments, especially ones with RESHAPE may cause the compiler to generate temporary arrays, and that the temporaries may be allocated on the stack. So I have replaced a number of matrix assignments with explicit DO loops. In particular, I replaced a couple instances of lines like

LWORK(1:N*M) = RESHAPE(DELTA(1:N,1:M),(/N*M/))

with nested DO-loops in which the 1-dimensional index is calculated within the loop.

I can now run problems that are about 10 times larger. I still have some things that I'm not sure about. There are calls to subroutines that look like

CALL DODCNT
&(FCN,
&N,M,NP,NQ,
&BETA(1:NP),
&Y(1:N,1:NQ),N,X(1:N,1:M),N,
&LWE(1:LDWE,1:LD2WE,1:NQ),LDWE,LD2WE,
&LWD(1:LDWD,1:LD2WD,1:M),LDWD,LD2WD,
&LIFIXB,LIFIXX(1:LDIFX,1:M),LDIFX,
&LJOB,LNDIGIT,LTAUFAC,
&LSSTOL,LPARTOL,LMAXIT,
&LIPRINT,LLUNERR,LLUNRPT,
&LSTPB,LSTPD(1:LDSTPD,1:M),LDSTPD,
&LSCLB,LSCLD(1:LDSCLD,1:M),LDSCLD,
&LWORK,LENWORK,LIWORK,LENIWORK,
&LINFO,
&LLOWER,LUPPER)

Do the array parameters like Y(1:N,1:NQ) incur the overhead of a temporary? If so, I should be able to get some more stack space by ALLOCATEing my own temporaries and copying (which I guess is what the compiler does if it creates a temporary?).

How about this:

CALL DWGHT(N,NQ,
&RESHAPE(WORK(WE1I:WE1I+LDWE*LD2WE*NQ-1),(/LDWE,LD2WE,NQ/)),
&LDWE,LD2WE,
&RESHAPE(WORK(FI:FI+N*NQ-1),(/N,NQ/)),
&TEMPRET(1:N,1:NQ))

I guess I can replace the RESHAPE calls with allocated arrays with copying before the call?

Steven_L_Intel1 · ‎05-05-2006

You can ask the compiler to tell you if it creates a temporary for a passed argument by enabling /check:arg_temp_created. (In VS, this is under Run-Time). The compiler's pretty good about creating the temp in such cases only when the argument is not contiguous and it is being passed to a routine that doesn't accept an assumed-shape array.

jd_weeks · ‎05-05-2006

I actually found that setting and turned it on. Where do the messages go? I'm using the Fortran compiler to create a static library that's linked into a C/C++ application. Does that cause a problem? Is there a call I can make to get that information? Like one of the xxxQQ functions?

The fact that I have been able to reduce the stack overflow problem by explicitly copying arrays suggests that the temporaries-on-the-stack is the problem. I wish there were a compiler setting to make it allocate temporaries from the heap. My application is used by scientists who tend to push everything to the limit, so reducing the problem isn't enough. If I reduce it, someone will simply feed it a bigger problem.

In fact, it seems like Fortran is the language of choice for lots of scientific applications that process large arrays, so this should be a frequent problem. What do others do? The explicit copying I've been writing is a pain in the neck!

Steven_L_Intel1 · ‎05-05-2006

The messages are displayed at run-time the way other messages are.

I expect that there will be an option in the future to do heap allocation of temporaries. But you can usually find a way to write your code to avoid the copies. If I could see a complete example, I could give you more advice.

jd_weeks · ‎05-06-2006

The package I'm compiling has quite a bit of code. What sort of example would you like? And in what format? I could send you a copy of the file...

I don't want to be a pest, but this is an important issue for me. I appreciate all the help you've given me so far.

Steven_L_Intel1 · ‎05-06-2006

A routine that shows the call and includes all declarations of variables used in the call, as well as any explicit interface (if present) of the routine being called. Doesn't have to be a complete routine, but I do need to see the declarations of all variables used in the call and any interface.

jd_weeks · ‎05-08-2006

I have attached a file containing some examples. I have included some comments; let me know if you need more.

jd_weeks · ‎05-08-2006

I wrote a really short message there, because when I tried to submit a longer message, I got a message to the effect that something timed out!

In my problems, the variable N is large (on the order of 100,000). For our customers that's not a large problem.

LDWD is usually equal to N.

Other variables like NQ and NP are small.

Let me know if you need more information.

jd_weeks · ‎05-08-2006

I meant to ask: does this problem afflict the compiler for Mac OS X as well?

Steven_L_Intel1 · ‎05-08-2006

It's the RESHAPE that is causing you the problem. This frequently creates a temporary array. Our compiler for MacOS would have a similar issue, but the stack size can be set for the session when you're running the program. I think that on MacOS that it is similar to UNIX/Linux where the stack will auto-expand up to the process stacksize limit.

Your DWGHT routine is properly declared with an explicit interface and assumed-shape arrays, which means that temps won't be created for that, but any function call in the argument list may create a temp.

jd_weeks · ‎05-08-2006

OK- what is the best solution? Is it possible to pass a section of a 1D array without using RESHAPE? Can I use EQUIVALENCE?

It would be better if the temporaries simply were allocated on the heap, but you know that already :)

jd_weeks · ‎05-08-2006

Oh, and the calls that include things like WE(1:N,1:NQ) don't create temporaries?

Steven_L_Intel1 · ‎05-08-2006

If you can come up with an array slice (such as your second example) that passes a contiguous section of memory, no copy will be made.

jimdempseyatthecove · ‎05-09-2006

hello jd, here are some of my comments:

When you call some of your subroutines you are including RESHAPE as a calling argument. By doing so the reshaped array is created as a temporary on stack. This consumes not only stack space but also consumes computation time by requiring block moves of data as it is imported into the subroutine. Also, if the subroutine were intendingto return values in the reshaped arrays the updated data would be lost.

I would suggest instead to pass in to the subroutine the entire array by reference and additionaly pass in the indexes to that array the produce the slice of interest for use in the call.

Perhaps though you are calling a C++ routine from the F90 code thinking that you need a reshape. If your data is layed out properly in the F90 code then the slices of interest will be represented by contiguous memory. If you can construct the F90 code in this manner then you simply need to produce a pointer to the first cell of the slice of interest.

An additional coding style, one which I prefer, is to create a user defined type that contains the argument list for the subroutine. Then fill in the members with in line code prior to the call and then simply pass the reference to the user defined type. In this manner, if it is necessary to usea reshape, then you will have more control by being able to place the data in allocatable memory as opposed to on the stack.

An additional advantage of this is it is easire to use OpenMP on multi-core systems when you code with what amounts to a C++ this pointer.

Jim Dempsey

(another jd)

jd_weeks · ‎05-09-2006

Thanks, Jim.

If the temporaries weren't created on the stack, I wouldn't have this problem.

All that you say is true, and I would take your advice if it were my code. The code in question is the ODRPACK95 package to do orthogonal distance regression. It is well-designed code for the most part and I don't really want to mess with code that works.

The code in question was written for Fortran 77 and "updated" to use some modern Fortran features. Now I need to back out some of the updates without breaking code that I don't know well.

Again, thanks for the observations. Shortly I will start working on a seek-and-destroy mission against the RESHAPE calls.