Call stack on uninitialized local variable run-time crash

Nick2 · ‎06-30-2009

This was a mind-boggling thing to debug...code went something like

IF(A)
C1=9

ELSE

ENDIF
D1=C1*2

Compiled with COMPAQ/CONSOLE, it would run to completion. Compiled with COMPAQ/DLL, it would crash and give me a stack trace of UNKNOWN/KERNEL32.DLL.

Compiled with IVF/CONSOLE, it crashed. The console output gave me a usable stack trace, but the debugger's own stack trace was messed up. (It stopped in a subroutine completely unrelated to where it had crashed). Even trying to step through (dis)assembly code, I couldn't get an indication that the error was an uninitialized local variable. The console stack trace said "floating overflow".

Is it possible to get a better run-time crash for uninitialized local variables? If not, what are some recommendations on detecting these types of issues before shipping off the code?

Nick2 · ‎06-30-2009

I guess even more generally...what is the way to debug a crash that only has a stack trace that looks like this(CVF)

KERNEL32! 7c812aeb()
MSVBVM60! 734f0c29()
MSVBVM60! 734ee082()

Steven_L_Intel1 · ‎06-30-2009

You won't get an "uninitialized local variable" error unless you have asked for uninitialized variable checking, and even that won't catch all cases. It's not unusual to have the point of error be some levels nested deeper from your own code, if the uninitialized variable is used in a call to the run-time library.

Can you show us a real (but short if possible) program that demonstrates the behavior you saw?

Nick2 · ‎07-01-2009

This turned out to be different than what I thought. (At least, after some debugging with CVF).

The part of code that is responsible for the crash is executed probably about 100,000 times in a given day, and I haven't seen this failure before.

The Fortran code goes like this; TIMD is a double-precision variable, and its value gets assigned to TIM. IVF compiles it so:

TIM=TIMD
007AFAAC fld qword ptr [CTIME2 (11FD9E0h)]
007AFAB2 fstp dword ptr [ebp-68h]
007AFAB5 movss xmm0,dword ptr [ebp-68h]
007AFABA movss dword ptr [TIM (0F3A5A8h)],xmm0

CVF compiles it so:

673: TIM=TIMD
025E11A8 fld qword ptr [_TIME2 (02974bb0)]
025E11AE wait
025E11AF fstp dword ptr [ebp-1E8h]
025E11B5 wait
025E11B6 mov edx,dword ptr [ebp-1E8h]
025E11BC mov dword ptr [_TIMING+28h (02a308c8)],edx

Now, I have a VB6 executable that iteratively calls the Fortran DLL and updates the text box with the current time step, TIM.

Before the call that results in Fortran setting TIM=27927.20 (or some other value, sequence-dependant), the VB6 WATCH window will show:

Expression Value Type
TIMENOW 27926.77 Single

After the call to the Fortran DLL (in the VB6 debugger), if I hit "go", I get the crash. If I step through each line, the VB6 WATCH window shows:

Expression Value Type
TIMENOW Integer

Stepping through lines such as Text1.text = Str(TIMENOW / TIME_CONV) ensures that I do not crash in the debugger, and TIMENOW is re-set to a type Single.

If I set breakpoints at the top and the bottom of the Fortran code, I get a crash after the "bottom" breakpoint, but before the "top" breakpoint, proving to me that it's crashing in the VB6 executable.

Steven_L_Intel1 · ‎07-01-2009

Just checking - you HAVE declared the Fortran routine to be STDCALL in the IVF version, right? IVF is using SSE code for some of the floating point, which CVF doesn't know how to do. You might try compiling with /arch:ia32 (IVF) to see if the behavior is closer to what you expect.

Nick2 · ‎07-08-2009

Steve,

All good ideas...but here goes the "why".

Basically the few lines of code that cause trouble look like this:

program main
real a,b,c
a=1e-40
b=1e+10
if a is not 0, and some other conditions are true, then
c=b/a
print *,c
end

We compileour codewith

Underflow gives 0.0; Abort on other IEEE exceptions (/fpe:0)

So, sure enough, the Console version re-sets the variable a to 0 because it's a de-normal, and we never end up performing the c=b/a line.

But the VB driven DLL version ignores the /fpe:0 specification...so it ends up running the c=b/a line, and then c becomes Infinity, and it's fed into various sqrt and log functions...

Is there a way to force the vb.net or vb6 executable that's driving the Fortran DLL torespect the/fpe:0 specification, and actually flush underflow to 0, and crash on NaN and Inf values?

Steven_L_Intel1 · ‎07-08-2009

There's a new option in 11.1, /fpe-all, which may do what you want.

Nick2 · ‎07-09-2009

Quoting - Steve Lionel (Intel)

There's a new option in 11.1, /fpe-all, which may do what you want.

Just to make sure - with /fpe-all:0, I need to set /Qftz in the project as well to flush under-flow values to 0? (In other words, is it a feature, or a bug?)

Terminology is a bit confusing, does "flush denormals to 0" mean "flush underflow floating values to 0?"

Steven_L_Intel1 · ‎07-09-2009

/fpe-all:0 should implies /Qftz - at least that's what the documentation says. But the documentation also says that /Qftz does not guarantee that all denormals get flushed to zero. The thing about underflows is that if you have a computation that would generate a denormal that it gets changed to zero. I think that a run-time exception handler is needed for this and I'm not sure if that's available in a DLL.

jimdempseyatthecove · ‎07-09-2009

Quoting - intel@karancevic.com

Just to make sure - with /fpe-all:0, I need to set /Qftz in the project as well to flush under-flow values to 0? (In other words, is it a feature, or a bug?)

Terminology is a bit confusing, does "flush denormals to 0" mean "flush underflow floating values to 0?"

A denormal number is a number with all 0's in the exponent field and represent numeric values of

0.fraction x 2**(-bias + 1)

for single precision the fraction is 23 bits of the mantissa and bias is 127

normalized numbers are

1.fraction x 2**(exponent-bias)

for single precision the exponent is a binary value in range of 0:255 (together with bias give + and - exponents)

The denormals can represent numbers smaller than the smallest normalized number but yet numbers that have not yet produced an underflow.

underflows are results where the none of the fraction bits can be encoded into the storage format (single or double as the case may be).

flush denormals to 0 does not mean flush underflows to 0.

you might consider it as flush near underflows to 0.

Jim Dempsey

Steven_L_Intel1 · ‎07-09-2009

Jim, in this case, underflows mean out of the normalized range. When FTZ is in effect, computations should not create denormalized values - they should go to zero instead.

jimdempseyatthecove · ‎07-09-2009

Quoting - Steve Lionel (Intel)

Jim, in this case, underflows mean out of the normalized range. When FTZ is in effect, computations should not create denormalized values - they should go to zero instead.

This is correct, I said nothing to contradict this statement.

When an internal (to FPU or SSE) generates an underflow the resultant number cannot be represented by neither normalized not denormalized. When result can be represented by denormalized but not normalized and FTZ is in effect, result is truncated to 0 as opposed to returning denormalized. The actual computation did not create an underflow, but the user chose not to see denormalized results.

It underflow becomes a question of semantics of the position in which you observe the calculation.

Should you have a function that returns 3 digits of precision then you could claim underflow when non-zero internal result is forced to return 0. i.e. viewpoint from outside function is underflow, view from inside function is non-underflow condition.

Jim