Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29253 讨论

What does "Floating Point Stack Check" mean

ferrad
新的用户
3,479 次查看
My code was merrily running along for a few minutes, then suddenly stopped in the debugger with a "Floating Point Stack Check", and took me to the offending line, which is a calculation:
Code:
          new_flux(isol) =
     .        d1_curr*h*(d0*(temp_res-temperature(istage,ifluid))-sume)/
     .           (d0*d1_curr + h*(d1_curr*sqrt(delt_loc) + d0*delt_loc))


Is the line too complex? Why only now after happily executing it hundreds of times before?
Adrian
0 项奖励
12 回复数
ferrad
新的用户
3,479 次查看
I tried breaking this calc up onto two lines, then it just stopped again later on another line with the same error.
Adrian
0 项奖励
TimP
名誉分销商 III
3,479 次查看
If you are using a compilation for x87 code (e.g. ifort 32-bit with no SSE options, such as -QxW, or maybe when using -Op or -fltconsistency), it looks like the compiler has exceeded the stack register size (8 items on stack). This would be likely to be a compiler bug, as the compiler should be tracking stack size. If so, reducing optimization level might well get past it. It used to be a common bug in gcc, when using mathinline functions. If you are using a current compiler, you might file a bug report.
Better alternatives to -Op and -fltconsistency are coming.
If my guesses above are off the mark, please give some of the missing information.
0 项奖励
Steven_L_Intel1
3,479 次查看
The usual cause of this problem in Fortran code is calling a function which returns a real but the caller thinks it returns an integer, or vice-versa. The effect of this error may not be apparent at the point of the call, it may occur later.

Check all your function declarations carefully. Try building with /gen_interfaces /warn:interfaces and see if the compiler alerts you to a mismatch.
0 项奖励
ferrad
新的用户
3,479 次查看

Steve,

Iadded /gen_interfaces /warn:interfaces to the compilation flags. It compiles a few files, then get stuck on one file. On checking Task Manager, I see fortcom.exe is taking up 99% of the CPU time.

Adrian

0 项奖励
Steven_L_Intel1
3,479 次查看
Yeah, I was afraid of that. Though we've fixed some similar problems with that feature, I still see that happen in some cases. Please submit a test case of that to Intel Premier Support.

So you're back to checking the declarations manually. You could always compile with /warn:declarations that forces you to explicitly declare everything.
0 项奖励
ferrad
新的用户
3,479 次查看

The Fortran DLL it is stopping in, is being called from a new C DLL. Is it possible the cause could be in the C DLL? or do I need to look in the Fortran DLL. The reason I ask is that the C DLL is new, and the FTN code has been running fine without it.

Adrian

0 项奖励
Steven_L_Intel1
3,479 次查看
Unlikely - it's harder to make that mistake in C.

Can you rebuild the C DLL and link against the C debug libraries? They have a feature for detecting this as it happens.
0 项奖励
ferrad
新的用户
3,479 次查看
Yes, I have built the C DLL with full debug (/Z7). Can I increase the size of this "Floating-Point Stack", if that's what the problem is?
Adrian
0 项奖励
Steven_L_Intel1
3,479 次查看
No, the problem is not a stack overflow in the traditional sense.

The X87 floating point unit has an internal stack where values get pushed on and then popped off as operations are done. As long as the pushes and pops match, you're fine. If you try to pop something and the stack is empty, you get a FP stack check.

When you call a function that returns a float, the return value is left on the stack when the function returns. The caller is supposed to pop it off. If you call a function where the caller thinks the function is REAL but the function itself is INTEGER (hence nothing on the FP stack), error. You can also get an error later if a value is left hanging on the FP stack.

Resolving the type inconsistencies is the only fix.
0 项奖励
ferrad
新的用户
3,479 次查看

For the record, I fixed this eventually. As I thought, it was in the C code which calls the Fortran DLL. I found that if I called a certain function in C, it failed in the FTN code after the 11th time calling the C function. And if I didnt call the function, it never failed in the FTN code.

On diagnosing that C function, I tracked it down to the use of the sqrt() function. Should be innocuous, but was not included. Now why it didn't fail with an unresolved external on the link of the C DLL I've no idea, but I guess it thought it had a sqrt function somewhere it could use. Well the one it used (who knows where it found one) I guess pushed and popped the floating point stack just badly enough to fail after the 11th occasion.

I included math.h, and all works just fine.

Adrian

0 项奖励
Steven_L_Intel1
3,479 次查看
You must have disabled the usual warnings about undeclared functions in C. I don't know what C assumes for such things, but it certainly wasn't that it returns a float. You would have found the normal libm version of sqrt which returned a float but never got it popped off (nor would the return value be correct.)
Glad to hear that you found it.
0 项奖励
Dishaw__Jim
初学者
3,479 次查看
By default C assumes that all functions will return an integer--that would match your analysis of the problem. If there is no return statement in a function, many (if not all) compilers will push a 0 on the stack so that there is a value to return. This can lead to all sorts of nasty bugs.



Not including math.h would not cause the linker to fail. The #include directive is a preprocessor directive. The math.h header defines data structure and function prototypes for the math routines (like sqrt, sin, etc). Without that header, the compiler would have assumed that sqrt returns an int and takes an arbitrary number of arguments.

Message Edited by jamesd42 on 02-15-2006 06:05 PM

0 项奖励
回复