Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28383 Discussions

What does "Floating Point Stack Check" mean

ferrad
New User
1,611 Views
My code was merrily running along for a few minutes, then suddenly stopped in the debugger with a "Floating Point Stack Check", and took me to the offending line, which is a calculation:
Code:
          new_flux(isol) =
     .        d1_curr*h*(d0*(temp_res-temperature(istage,ifluid))-sume)/
     .           (d0*d1_curr + h*(d1_curr*sqrt(delt_loc) + d0*delt_loc))


Is the line too complex? Why only now after happily executing it hundreds of times before?
Adrian
0 Kudos
12 Replies
ferrad
New User
1,611 Views
I tried breaking this calc up onto two lines, then it just stopped again later on another line with the same error.
Adrian
0 Kudos
TimP
Honored Contributor III
1,611 Views
If you are using a compilation for x87 code (e.g. ifort 32-bit with no SSE options, such as -QxW, or maybe when using -Op or -fltconsistency), it looks like the compiler has exceeded the stack register size (8 items on stack). This would be likely to be a compiler bug, as the compiler should be tracking stack size. If so, reducing optimization level might well get past it. It used to be a common bug in gcc, when using mathinline functions. If you are using a current compiler, you might file a bug report.
Better alternatives to -Op and -fltconsistency are coming.
If my guesses above are off the mark, please give some of the missing information.
0 Kudos
Steven_L_Intel1
Employee
1,611 Views
The usual cause of this problem in Fortran code is calling a function which returns a real but the caller thinks it returns an integer, or vice-versa. The effect of this error may not be apparent at the point of the call, it may occur later.

Check all your function declarations carefully. Try building with /gen_interfaces /warn:interfaces and see if the compiler alerts you to a mismatch.
0 Kudos
ferrad
New User
1,611 Views

Steve,

Iadded /gen_interfaces /warn:interfaces to the compilation flags. It compiles a few files, then get stuck on one file. On checking Task Manager, I see fortcom.exe is taking up 99% of the CPU time.

Adrian

0 Kudos
Steven_L_Intel1
Employee
1,611 Views
Yeah, I was afraid of that. Though we've fixed some similar problems with that feature, I still see that happen in some cases. Please submit a test case of that to Intel Premier Support.

So you're back to checking the declarations manually. You could always compile with /warn:declarations that forces you to explicitly declare everything.
0 Kudos
ferrad
New User
1,611 Views

The Fortran DLL it is stopping in, is being called from a new C DLL. Is it possible the cause could be in the C DLL? or do I need to look in the Fortran DLL. The reason I ask is that the C DLL is new, and the FTN code has been running fine without it.

Adrian

0 Kudos
Steven_L_Intel1
Employee
1,611 Views
Unlikely - it's harder to make that mistake in C.

Can you rebuild the C DLL and link against the C debug libraries? They have a feature for detecting this as it happens.
0 Kudos
ferrad
New User
1,611 Views
Yes, I have built the C DLL with full debug (/Z7). Can I increase the size of this "Floating-Point Stack", if that's what the problem is?
Adrian
0 Kudos
Steven_L_Intel1
Employee
1,611 Views
No, the problem is not a stack overflow in the traditional sense.

The X87 floating point unit has an internal stack where values get pushed on and then popped off as operations are done. As long as the pushes and pops match, you're fine. If you try to pop something and the stack is empty, you get a FP stack check.

When you call a function that returns a float, the return value is left on the stack when the function returns. The caller is supposed to pop it off. If you call a function where the caller thinks the function is REAL but the function itself is INTEGER (hence nothing on the FP stack), error. You can also get an error later if a value is left hanging on the FP stack.

Resolving the type inconsistencies is the only fix.
0 Kudos
ferrad
New User
1,611 Views

For the record, I fixed this eventually. As I thought, it was in the C code which calls the Fortran DLL. I found that if I called a certain function in C, it failed in the FTN code after the 11th time calling the C function. And if I didnt call the function, it never failed in the FTN code.

On diagnosing that C function, I tracked it down to the use of the sqrt() function. Should be innocuous, but was not included. Now why it didn't fail with an unresolved external on the link of the C DLL I've no idea, but I guess it thought it had a sqrt function somewhere it could use. Well the one it used (who knows where it found one) I guess pushed and popped the floating point stack just badly enough to fail after the 11th occasion.

I included math.h, and all works just fine.

Adrian

0 Kudos
Steven_L_Intel1
Employee
1,611 Views
You must have disabled the usual warnings about undeclared functions in C. I don't know what C assumes for such things, but it certainly wasn't that it returns a float. You would have found the normal libm version of sqrt which returned a float but never got it popped off (nor would the return value be correct.)
Glad to hear that you found it.
0 Kudos
Dishaw__Jim
Beginner
1,611 Views
By default C assumes that all functions will return an integer--that would match your analysis of the problem. If there is no return statement in a function, many (if not all) compilers will push a 0 on the stack so that there is a value to return. This can lead to all sorts of nasty bugs.



Not including math.h would not cause the linker to fail. The #include directive is a preprocessor directive. The math.h header defines data structure and function prototypes for the math routines (like sqrt, sin, etc). Without that header, the compiler would have assumed that sqrt returns an int and takes an arbitrary number of arguments.

Message Edited by jamesd42 on 02-15-2006 06:05 PM

0 Kudos
Reply