Re: abrupt underflow

TimP · ‎07-14-2001

We note that VF programs running on P4 may spend a great deal of time processing de-normalized numbers. Believing that setting abrupt underflow (also know as flush to zero) would be part of the solution, I'm trying to understand whether SETCONTROLFPQQ() provides for this. Many of the de-normalized numbers are generated by storing tables of direction cosines in single precision.

The FPCW$UNDERFLOW bit seems to control only whether underflow generates an exception, which I don't want.

I've tried to use FOR_SET_FPE() but got no effect. I haven't found examples which would show if it is meant to be used this way.

The VA subroutine are running under a main() built with CALL, and my experiments indicate that /fpe:0 has no effect in this case. Anyway, the application does bad things elsewhere, using double precision arrays as a general purpose means for copying mixed type data, and all of these must be found and fixed before /fpe:0 would work. In the absence of a portable 64-bit integer in Fortran, those are dealt with by calling back to something in C. Needless to say, a partial solution is wanted in short order.

I would prefer a scheme which allows simply changing the underflow behavior to abrupt without affecting exceptions. With that in effect, trapping of denormals might be turned on to catch any remaining use of mis-typed data.

Steven_L_Intel1 · ‎07-14-2001

Tim,

I suggest sending this sort of question to us at vf-support@compaq.com - this is the place to write if you want a response from a Compaq engineer. One or two of us read this forum, but not our expert in this area.

Steve

TimP · ‎07-15-2001

Apologies for the typos; I shouldn't have clicked on the spell check. It changed all the compiler names (protecting the innocent?). The compiler for main() is CL (MSVC6). There have been a number of presentations on how the P4 was designed with the expectation of running in abrupt underflow mode that I thought this would be an FAQ and barely worthy of the time of the forum audience. I certainly hope that Intel understands that the reputation of CVF was built on a higher level of service than most Windows compiler customers can expect.

Steven_L_Intel1 · ‎07-15-2001

I certainly hope that Intel understands that the reputation of CVF was built on a higher level of service than most Windows compiler customers can expect.

From what they have been telling us, they do.

Steve

TimP · ‎07-16-2001

Comparing the for_get_fpe() value of 0 set by /fpe:3 and the z'11000F' set by /fpe:0, I find that using for_set_fpe() to set this mask to z'110001' or z'10001' appears to give the behavior I'm looking for. If that continues to work, a major task remaining will be to explain why. I could describe the latter value as being documented in DFLIB.F90 as the setting to invoke abrupt underflow and underflow trap.

durisinm · ‎07-16-2001

What's a denormalized number? The next-to-last CVF newsletter article on floating point numbers also mentioned them.

Mike

Intel_C_Intel · ‎07-16-2001

The /fpe:n appears to operate differently depending on the project type(console, Windows app, DLL, etc).

You might like to take a peek at J. DEMMEL, Underflow and the
reliability of numerical software, SIAM J. Sci. Stat. Comput., 5 (1984),
pp. 887-919.

If that fails to allay your fears, consider the following: On Intel,
denormals, underflows, and inexacts are the lowest in the x87 exception
precedence. Masked denormals can be accompanied by a lower-priority
exception (underflow or inexact); underflows can also trigger inexacts;
and only the lowest-ranked inexacts can occur in isolation. In short, if
any of these show, you'll have a hard time deciding its origin: just
clear all fpe status bits in both hardware and memory and carry on.

If you call GETCONTROLFPQQ(control), set control = control .AND. #0000, redefine cw to your liking,eg, VC++ x86 default (all fpe traps disabled
control = FPCW$NEAR + FPCW$53 + FPCW$INVALID + FPCW$ZERODIVIDE + FPCW$OVERFLOW + FPCW$UNDERFLOW + FPCW$DENORMAL + FPCW$INEXACT)
and then do a SETCONTROLFPQQ(control). To mask a fpe define FPE_MASK = FPE_M_TRAP_UND, for eg, and pass it to FOR_SET_FPE(FPE_MASK).

Good Luck,
Gerry T.

TimP · ‎07-16-2001

The IEEE 754 and 854 standards, which formally nowadays go by a newer ISO designation, mandate gradual underflow. Numbers which are smaller in magnitude than the underflow threshold (2.22....e-308 for double precision) are to be stored with the exponent value (MINEXPONENT(x)-1), but with the high order bit not suppressed as it is for normalized numbers. As the numbers get smaller, more leading 0's come in, until the smallest non-zero number EPSILON(x)*2**MINEXPONENT(x) has only 1 signficant bit. These numbers EPSILON(x)*2**MINEXPONENT(x) <= x < 2**MINEXPONENT(x), which contain an expressed high order bit, are called de-normalized. i386 through P-III compatible CPU's included on-chip support for fld and fst to convert de-normalized numbers between memory and register format, but (in effect) NetBurst chips use trap handlers to perform "x87 assists." This treatment has been the one used by typical RISC chips in Unix boxes. I'm sure I've done violence to the technical facts.

Intel P4 optimization guides contain recommendations to use "flush-to-zero" settings, where all de-normals are set to 0 when generated, and "data-as-zero" settings, where any de-normalized data would be treated as zero. The Intel ICL compiler reference describes a /Qftz compile option, but it may not be implemented. MSVC and gcc programmers are expected to use _asm constructs, without adequate guidance.

CVF's /fpe:0 option should flush generated de-normals to zero, but traps on several other situations. One of the reasons for trapping might be to catch situations such as the one I described with the use of mis-typed data.

The reasons for the IEEE standard specifying gradual underflow include preservation of accuracy in calculations involving values |x| < 2**MINEXPONENT(x)/EPSILON(x), which was a serious problem in older formats such as VAX D-float. One must recognize that a flush-to-zero option may reduce accuracy of single precision results of magnitude < .5**31, but the hardware designers have told us we can't have both accuracy and speed there.

Intel_C_Intel · ‎07-16-2001

The /fpe:n appears to operate differently depending on the project type(console, Windows app, DLL, etc).

You might like to take a peek at J. DEMMEL, Underflow and the
reliability of numerical software, SIAM J. Sci. Stat. Comput., 5 (1984),
pp. 887-919.

If that fails to allay your fears, consider the following: On Intel,
denormals, underflows, and inexacts are the lowest in the x87 exception
precedence. Masked denormals can be accompanied by a lower-priority
exception (underflow or inexact); underflows can also trigger inexacts;
and only the lowest-ranked inexacts can occur in isolation. In short, if
any of these show, you'll have a hard time deciding its origin: just
clear all fpe status bits in both hardware and memory and carry on.

If you call GETCONTROLFPQQ(control), set control = control .AND. #0000, redefine cw to your liking,eg, VC++ x86 default (all fpe traps disabled
control = FPCW$NEAR + FPCW$53 + FPCW$INVALID + FPCW$ZERODIVIDE + FPCW$OVERFLOW + FPCW$UNDERFLOW + FPCW$DENORMAL + FPCW$INEXACT)
and then do a SETCONTROLFPQQ(control). To mask a fpe define FPE_MASK = FPE_M_TRAP_UND, for eg, and pass it to FOR_SET_FPE(FPE_MASK).

Good Luck,
Gerry T.

Intel_C_Intel · ‎07-16-2001

A VNI implementation of the f2k ieee exception handling via f90 modules
proposal suggests that CVF 6.5a for x86 does indeed have problems in setting ieee
rounding. I first noticed this in DVF 5.x and reported the problem to vf support. Perhaps it'll get fixed when CVF transitions to IVF.

BTW, has Palmer been with Compaq since they acquired Digital and if so is he going back to Intel?

--
Gerry T.