Re: 1/-Infinity = NaN when using SSE - Page 2

mkatsev · ‎03-26-2008

Code:

program test
integer i
real r(100),t

t = 0
do i=1,100
r(i) = 1/log(t)
enddo

print *,r(1)
endprogram

When I compile with options "/libs:static /threads /c" result is " 0.0000000E+00" (as expected)
When I add "/QaxK" result is NaN
If I modify the cycle so that to prevent vectorization (add "print *,'.'" for example) the result is back " 0.0000000E+00"

Compiler: Intel fortran 10.0 for win

Any ideas why this happens?

g_f_thomas · ‎04-04-2008

Brian,

The following code snippetswhich depend on some IVF extensions might be useful to you:

***********code*********

subroutine Getfpucw (fpcw)

!Get the current fpu control word

use dflib

implicit none

character(len=48), intent(out) :: fpcw(8) !Array of exceptions, precision, and rounding mode

!Local
integer(2) :: control = 0 !Current cw

call GETCONTROLFPQQ(control)
call fpcontrl(control, fpcw)

return

end subroutine Getfpucw

subroutine fpcontrl(control, fpcw)

!Describe the contents of the floating point control word.
!Interpret each sub field of the word individually.

use dflib

implicit none

integer(2), intent(in) :: control!Two byte control word from GETCONTROLFPQQ
character(len=48), intent(out) :: fpcw(8)!Array of exceptions, precision, and rounding mode

!Local
integer(2) tcontrol!Two byte temporary control word

!Interpret the exception trap field. Retain only the exception bits
tcontrol = IAND(control, FPCW$MCW_EM)

!Test each exception and print only the ones that are set
if(tcontrol /= 0) then
if(iand(tcontrol, FPCW$INVALID) /= 0) then
fpcw(1) = 'Invalid operation trap disabled'
else
fpcw(1) = 'Invalid operation trap enabled'
endif
if(iand(tcontrol, FPCW$DENORMAL) /= 0) then
fpcw(2) = 'Denormalized operand trap disabled'
else
fpcw(2) = 'Denormalized operand trap enabled'
endif
if(iand(tcontrol, FPCW$ZERODIVIDE) /= 0) then
fpcw(3) = 'Zero divide trap disabled'
else
fpcw(3) = 'Zero divide trap enabled'
endif
if(iand(tcontrol, FPCW$OVERFLOW) /= 0) then
fpcw(4) = 'Overflow trap disabled'
else
fpcw(4) = 'Overflow trap enabled'
endif
if(iand(tcontrol, FPCW$UNDERFLOW) /= 0) then
fpcw(5) = 'Underflow trap disabled'
else
fpcw(5) = 'Underflow trap enabled'
endif
if(iand(tcontrol, FPCW$INEXACT) /= 0) then
fpcw(6) = 'Inexact (precision) trap disabled'
else
fpcw(6) = 'Inexact (precision) trap enabled'
endif
else
fpcw(1) = 'All exception traps enabled'
endif

!Interpret the precision control field. Retain only the precision bits
tcontrol = IAND(control, FPCW$MCW_PC)

!Test precision and print the one that is set
Select Case (tcontrol)
Case (FPCW$24)
fpcw(7) = '24 bit precision enabl ed'
Case (FPCW$53)
fpcw(7) = '53 bit precision enabled'
Case (FPCW$64)
fpcw(7) = '64 bit precision enabled'
Case Default
fpcw(7) = 'Invalid precision control'
End Select

!Interpret the rounding control field. Retain only the rounding bits
tcontrol = IAND(control, FPCW$MCW_RC)

!Test rounding and print the one that is set
Select Case (tcontrol)
Case (FPCW$NEAR)
fpcw(8) = 'Round to nearest (or even) enabled'
Case (FPCW$DOWN)
fpcw(8) = 'Round down (to -INF) enabled'
Case (FPCW$UP)
fpcw(8) = 'Round up (to +INF) enabled'
Case (FPCW$CHOP)
fpcw(8) = 'Chop (truncate to 0) enabled'
Case Default
fpcw(8) = 'Invalid round control'
End Select

return

end subroutine fpcontrl

subroutine Setfpucw

!Set the fpu control word

use dflib

implicit none

!Local
integer(2) :: control = 0!Current cw
integer(4) :: FPE_STS = 0, FPE_MASK = 0!fpe masks

!An exception is dis/enabled if its flag is set/cleared to 1/0

call GETCONTROLFPQQ(control)
!Clear all cw flags
control = control .AND. #0000
!Set cw to VC++ default: all fpe traps disabled or masked
control = FPCW$NEAR+&
FPCW$53+&
FPCW$INVALID+&
FPCW$ZERODIVIDE+&
FPCW$OVERFLOW+&
FPCW$UNDERFLOW+&
FPCW$DENORMAL+&
FPCW$INEXACT
call SETCONTROLFPQQ(control)
!The foregoing is equivalent to /fpe:3,
!ie, 53-bit precision, round to nearest, disable or mask
!the denormal, underflow. inexact precision, overflow,
!division by zero, and invalid traps.

!Unmask fp overflow, division by zero, and invalid
!FPE_MASK = FPE_M_TRAP_OVF+&
! FPE_M_TRAP_DIV0+&
! FPE_M_TRAP_INV
!To unmask fpunderflow, include FPE_M_TRAP_UND or via the Runtime Error Checking 'Underflow' option
!FPE_STS = FOR_SET_FPE(FPE_MASK)
!The foregoing is equivalent to /fpe:0,
!ie, 53-bit precision, round to nearest, disable the denormal,
!underflow. and inexact precision exceptions and enable the
!overflow, division by zero, and invalid traps.

return

end subroutine Setfpucw

subroutine Getfpusw (fpsw)

!Get the current fpu status word

use dflib

implicit none

character(len=48), intent(o ut) :: fpsw(6) !Array of low exception bits

!Local
integer(2) :: status = 0 !Current sw

call GETSTATUSFPQQ(status)
call fpstatus(status, fpsw)

return

end subroutine Getfpusw

subroutine fpstatus(status, fpsw)

!Describe the contents of the floating point status word.
!Interpret only the low six exception bits.

use dflib

implicit none

integer(2), intent(in) :: status!Two byte status word from GETSTATUSFPQQ
character(len=48), intent(out) :: fpsw(6) !Character array of exceptions currently set

!Local
integer(2) tstatus ! Two byte temporary status word

!Retain only exception bits
!
tstatus = iand(status,FPSW$MSW_EM)

!Test each exception and print only the one(s) that are set
!
if(tstatus == 0) then
fpsw(1) = 'No exceptions'
endif
if(tstatus /= 0) then
if(iand(tstatus, FPSW$INVALID) /= 0) then
fpsw(1) = 'Invalid operation exception'
endif
if(iand(tstatus, FPSW$DENORMAL) /= 0) then
fpsw(2) = 'Denormalized operand exception'
endif
if(iand(tstatus, FPSW$ZERODIVIDE) /= 0) then
fpsw(3) = 'Zero divide exception'
endif
if(iand(tstatus, FPSW$OVERFLOW) /= 0) then
fpsw(4) = 'Overflow exception'
endif
if(iand(tstatus, FPSW$UNDERFLOW) /= 0) then
fpsw(5) = 'Underflow exception'
endif
if(iand(tstatus, FPSW$INEXACT) /= 0) then
fpsw(6) = 'Inexact (precision) exception'
endif
endif

return

end subroutine fpstatus
*******************end*******************

BTW, re -0, the option is described as "Enable IEEE Minus Zero Support" in the Project/Properties/Floating Point page.

Gerry

brianlamm · ‎04-04-2008

Tim, Thanks Much, and yes I was writing "tie" meaning when two closest neighbors are equidistant from target. I'm trying to wrap my little head around "nearest even" you wrote the IEEE specifies: does that mean nearest and then in case of tie even, in order of precedence? Or is some subtle interpretation lurking below that face-value interpretation of mine?

Gerry, Great Stuff! I will Certainly implement what you have graciously provided until IVF support for IEEE intrinsics becomes available.

-Brian

Steven_L_Intel1 · ‎04-04-2008

Brian,

Take a look at Dave Eklund's "The Perils of Real Numbers" articles in the Newsletter thread at the top of this forum. It explains the rounding. But it is simple - round to the nearest representable value, and if the value is exactly halfway between two representable values, round to the one with a LSB of zero (round to even).

brianlamm · ‎04-04-2008

Thanks Steve. I know what round to nearest is, and I know what round to even is, I just wondered if there was something I was missing with the term round to "nearest even", which seems to me to imply try nearest first, even for ties, just like it reads, and I just wanted to make sure I wasn't missing something that might be more subtly implied by "nearest even". I'll just take away "nearest even" means nearest first, even for tie.

-Brian

Steven_L_Intel1 · ‎04-04-2008

Perhaps I'm not understanding you, but in the case of a tie, there are two "nearest". When this happens, you pick the one with a zero LSB. Sometimes this will be up, sometimes down.

brianlamm · ‎04-04-2008

Steve, Tim wrote:

tim18:
...
Incidentally, the IEEE standards specify that round to nearest even must be the default, and the mode chosen when rounding mode control is not implemented.

"nearest even" was to what I was referring in Tim's post regarding what IEEE default behavior is. That's what started me thinking I might be missing something subtle. I believe Tim meant either IEEE literally reads "nearest even", or it was shorthand for round to nearest and in case of a tie round to even. That's what I'm going to take away from Tim's post.

-Brian

brianlamm · ‎04-06-2008

MAD cprince:
/fp: in ifort doesn't have the options on intermediate precision which are supported by icc. So, there isn't full support for anything but source precision.

Unelss I'm reading the IVF doc incorrectly, there is, currently anyway, an option "on" intermediate precision which is not source precision: it is /fltconsistency or /Op and the seventh bullet in the IVF doc description reads:

Whenever an expression is spilled, it is spilled as 80 bits (extended precision), not 64 bits (DOUBLE PRECISION). When assignments to type REAL and DOUBLE PRECISION are made, the precision is rounded from 80 bits down to 32 bits (REAL) or 64 bits (DOUBLE PRECISION). When you do not specify /Op, the extra bits of precision are not always rounded away before the variable is reused.

I believe, however, Intel has deprecated this option, which I assume means it will be deleted in the (near?) future. If so, any idea of when it will be deleted.

Am I interpreting IVF doc correctly? That is, none of the /fp:(precise, fast (-1,-2), strict, source) options will enable 80-bit intermediate precision?

-Brian

Steven_L_Intel1 · ‎04-06-2008

There is no actual support for 80-bit precision. Sometimes, when you're using x87 code, 80-bit intermediate values get created. You can't control this directly, and if you use /QxW or "higher", x87 code tends not to be used at all.

/fp:source is intended to always use declared precision. You are right that /fltconsistency, /Op, etc. are deprecated and you should not use them, as /fp will produce better and more consistent code. If you're not using /QxW (at least for now), you'll get x87 code and /fp:fast will still allow 80-bit intermediates.

TimP · ‎04-06-2008

At /Od, the 32-bit ifort uses x87 code. /Op brings about the use of some x87 code (not in vectorized code). In order to use all the bits of the x87 format, you would have to over-ride the default setting of 53-bit precision.
The compiler doc seems inconsistent in implying that -pc80 sets 64-bit precision, and further identifying that as a default, even for the Intel64 compiler which has no x87 code option.
In the 64-bit ifort, /Op uses SSE2 for extra precision intermediates, so there is never promotion beyond double precision.
Certain compilers, such as gfortran, do support a real*10 data type. Evidently, there is no vectorization of real*10. It relies on the glibc run-time support for C long double, which is not present on Windows.

TimP · ‎04-06-2008

I tested the paranoia program spara.f from http://www.netlib.org/paranoia/ (with my own bug fixes). With 32-bit ifort Windows, it diagnoses 29 bits extra precision (double precision intermediates) with ifort -assume:protect_parens, and 40 bits (x87 64-bit precision intermediates), when -Qpc80 is added to the options. So, it appears the pc80 option does set 64-bit precision mode.
-QxB, or any option which enables vectorization, removes extra precision, but adding -Op restores it (but only for non-vectorized loops).
-QxB or one of the recommended SSE options appears to be needed for accurate formatted read/write with ifort 10.1.021.
Paranoia has to be built with inter-procedural optimization disabled, as well as other precautions, including protect_parens.