Re: 1/-Infinity = NaN when using SSE

mkatsev · ‎03-26-2008

Code:

program test
integer i
real r(100),t

t = 0
do i=1,100
r(i) = 1/log(t)
enddo

print *,r(1)
endprogram

When I compile with options "/libs:static /threads /c" result is " 0.0000000E+00" (as expected)
When I add "/QaxK" result is NaN
If I modify the cycle so that to prevent vectorization (add "print *,'.'" for example) the result is back " 0.0000000E+00"

Compiler: Intel fortran 10.0 for win

Any ideas why this happens?

TimP · ‎03-26-2008

It appears that you want the -Qprec-div option. When inversion is performed by the Newton's iteration scheme chosen by default for single precision SSE, the NaN is produced. prec-div is needed to produce IEEE results.

mkatsev · ‎03-26-2008

Yeah, thanks

By the way, shouldn't the result be -0 but not +0?

TimP · ‎03-26-2008

That's an interesting question. By my reading of the textbooks on Fortran
http://www.oup.com/uk/catalogue/?ci=9780198526933
the only way Fortran is required to distinguish the sign of 0. is with the sign() intrinsic, so I try
print *,r(1), sign(1.,r(1)) < 0
and I see what appears to be an incorrect result. Do you want to file a bug report?

mkatsev · ‎03-27-2008

Turned out that there is another option :)
/assume:minus0 - enables negative zero
However thanks for your help

Strange that all this options are off by default and almost impossible to find if not knowing what to look for

TimP · ‎03-27-2008

Yes, there is a tendency in commercial compilers to set optimizations which break the standard by default, regardless of how unlikely they are to be useful and how likely they are to cause problems. Some of them are held over from past processors which had a large performance penalty for following the standard (don't know if that is the case with minus0). It looks like the minus0 option may give better performance for random mixtures of positive and negative operands, as it generates branchless code.
In this case, my own opinion is that minus0 (thanks for pointing it out) should be included in -fp:precise. I guess I'll adopt -O3 -assume:protect_parens, minus0,buffered_io -Qprec-div -Qprec-sqrt as the most aggressive options I would use. prec-div and prec-sqrt don't give away nearly as much performance on the Penryn processors as on other Intel models.

Steven_L_Intel1 · ‎03-27-2008

/assume:minus0 exists because making it the default would 1) break some existing programs, 2) be slower in many cases. Most applications don't care. The Fortran standard has changed over time with how it treats -0: in Fortran 2003, the definition of CSQRT changed in an incompatible way compared to Fortran 95 in this regard.

TimP · ‎03-27-2008

ifort is requiring /assume:minus0 to make sign() compatible with Fortran 95:
http://developers.sun.com/solaris/articles/sign.html
The IVF doc quotes the f95 standard, but the see also foot-link leads to the documentation that -minus0 is required to comply with f95. Unfortunately, minus0 currently prevents vectorization of sign intrinsic. That is expected to be fixed in a future release.
The f95 treatment is the only one available in gfortran.

brianlamm · ‎04-01-2008

Tim wrote "I guess I'll adopt -O3 -assume:protect_parens, minus0,buffered_io -Qprec-div -Qprec-sqrt as the most aggressive options I would use. prec-div and prec-sqrt don't give away nearly as much performance on the Penryn processors as on other Intel models."

It is vitally important to me to get IEEE 754 arithmetic, except for -0 support. After reading some of the documentation (which sorely needs an overhaul) using /fp:precise automatically observes what using /assume:protect_parens does, I believe. The IVF doc reads:

When this option [/fp:precise] is specified, the compiler applies the following semantics:

Additions are performed in program order, taking into account any parentheses
Intermediate expressions use the precision specified in the source code
The constant addition may be pre-computed, assuming the default rounding mode

What I would contemplate using though is

/fp:strict /fp:except- /Qprec-sqrt /Qprec-div /Qprec /fpe:3

(fpe:3 is default, I just wanted to show it). A lot of other settings are not shown as I want to focus on getting IEEE arithmetic.

Now, here's where I'm going to show some (quite a bit?) of ignorance (but I appeal to the fact the documentation does not nearly keep up with the compiler capabilities): I contemplate using /fp:strict + /fp:except- so exceptions are disabled (so I won't get a run-time abend) so I can investigate IEEE flags and act accordingly. From what I can gather from the documentation is that combination also "allows modification of the floating point environment", whereas /fp:precise does not. And here's my ignorance: HOW does one modify "the floating point environment".

Now, on to the REAL can-o-worms question: Exactly what minimal set of options to the compiler will Guarantee IEEE 754 compliance, except for -0 support?

Please, all you IVF-use experts, please chime in on your assessment!

-Brian

TimP · ‎04-01-2008

I'll answer just part of this:
/fp:precise includes /assume:protect_parens /Qprec-div /Qprec-sqrt /Qftz- but not /assume:minus0. Some of what /fp:precise does is not required by IEEE or Fortran standards.

brianlamm · ‎04-01-2008

Now, that's just what I need (as a beginning anyway)!

I had NO IDEA /fp:precise included all those other options. I cannot find that in the IVF doc! But I am glad to see /Qftz- is enabled to avoid flush to zero. I should have also added I wish to use /O3 optimization while fully complying with IEEE except for -0 support.

I think I need /Qprec to avoid "performing optimizations that change NaN comparison semantics and causes all values to be truncated to declared precision before they are used in comparisons."

Tim, I know this question is a hotbed, especially for people "on the inside", since it clearly calls into question whether IEEE 754 conformance can be had while maintaining the highest possible optimizations, so thank you very much for your reply.

Keep the assessments coming, please!

And, I would still like to know how/why I would want to change the "floating point environment". The IVF doc gives precious little information on the precise nature of it. One place in IVF doc reads "There are several methods available if you want to modify the default floating point environment. For example, you can use inline assembly, compiler built-in functions, and library functions." But I Need All the examples IVF is capable of letting me change that environment. It's got to be spread all over the IVF doc, and I'd bet some is Undocumented, alas. Anyway, I just want to make sure I have that capability, and using /fp:strict + /fp:except- I believe allows ability to change floating point environment while still allowing exceptions to be raised but not cause application abend. Correct? Incorrect?

-Brian

brianlamm · ‎04-03-2008

Thanks Tim. You seem to be the only insider at Intel with the pineapples necessary to, at least partially, answer my question about which (minimal) set of ifort compiler options are necessary to get full IEEE compliance, except -0 support.

So, it's apparent no one inside Intel is willing to put it out there whether or not it is even possible to get IEEE 754 conformance from IVF, leading to two possible obvious conclusions:

No one inside or outside of Intel even knows whether or not it is possible and are not willing to communicate such, or it is not possible. If no one inside Intel even knows, that's certainly a sad state of affairs (much like the IVF documentation).

-Brian

Steven_L_Intel1 · ‎04-03-2008

Brian,

I don't think such comments are appropriate.

/fp:strict is the one option that would ensure, to the best of the compiler's ability, all IEEE FP semantics. /assume:minus0 is a Fortran language option not related to IEEE FP behavior. It controls how the SIGN intrinsic and formatted I/O behave in the presence of a minus zero - it does NOT control whether or not such a value can be produced.

brianlamm · ‎04-03-2008

MADsblionel:
Brian,

I don't think such comments are appropriate.

I tried deleting it but was not allowed since replies were made. Apologies to all of you.

And thanks much for the replies subsequent to my inappropriate post.

-Brian

g_f_thomas · ‎04-03-2008

If anyone knows about IEEE fp compliance then surely it's Intel. It was Intel who commissioned Kahan to come up with the standard in the first instance. I believe DEC had an alternative but it wasn't widely adopted.

As Steve and tim18 pointed out /fp:precise ensures compliance although rounding mode has an impact and is useful in interval arithmetic. Isn't -0 in the same league as 0**0 and the like and is beyond the scope of the IEEE fp standard?

Gerry

brianlamm · ‎04-03-2008

My fault I did not make it clear even though I was not intent on -0 support, I did want IEEE compliance. Reading my previous posts, I see how it could be interpreted I thought -0 support was in IEEE 754. -0 supportis in F2003.

As far as I can tell, the difference between [/fp:precise] and[/fp:strict /fp:except-] is in the latter contractions are disabled and access to modifying the floating point environment is enabled.What I cannot gather from doc is whether or not the latter enables full precision rounding of intermediate results, or whether both have the same effect as rounding to precision based on source.

Anyone know the answer to that?

-Brian

Steven_L_Intel1 · ‎04-03-2008

Just to make it clear - the change to /QxW as the default will happen in a future new version, not a 10.1 update. The release notes suggest what to do if you want to continue using "generic" code.

While you're thinking of futures, also read the release notes for information about the new "compat" OpenMP libraries which will be the default in that same future version.

If performance is important to you, you should be using one of the options to get SSE instructions instead of x87.

brianlamm · ‎04-04-2008

If I use /fp:precise I have no access to modify floating point environment, maybe not a "bad" thing.

I have read the IVF doc regarding rounding:if I use /fp:precise then rounding mode cannot be modified and is "default". How am I to determine what that default is? IEEE default rounding is to nearest, and to even in case of a tie. Is that default for /fp:precise? If not, then I think I must specify /fp:strict /fp:except- so I can use IEEE intrinsics to modify rounding to nearest. IVF documentation mentions nothing about rounding mode in case of a tie.

Performance is always a goal, but not at the cost of correctness. For most of my purposes, I need IEEE conformance to get the utmost in accuracy "first", and performance "second".

Steven_L_Intel1 · ‎04-04-2008

The default for all values of /fp is IEEE default as you describe it, or what F2003 calls IEEE_NEAREST. This is a processor default, not specific to Fortran. If you want to be able to change the rounding mode, then indeed you'll want "strict".

Eventually we will support the F2003 IEEE intrinsic modules so you can get and set this information cleanly.

I assume that you have also performed a computational arithmetic analysis of your code so that you don't run into issues such as cancellation errors, etc.

brianlamm · ‎04-04-2008

Thanks much Steve, that's good to know, and I look forward to IVF support of IEEE intrinsic support ...

I have done considerable floating point arithmetic analyses on some of my codes where it's particularly important, yet so many factors are involved it's difficult at times.

Do you know what the processor default rounding is in case of a tie? I cannot find that in IVF doc, and perhaps that is not the appropriate place to find the answer to that question.

-Brian

TimP · ‎04-04-2008

In case you are asking about rounding for half-way values:
All CPUs designed since IEEE-754 follow that standard, often called "banker's rounding," or "round to nearest even." Fortran over-rules the hardware rounding mode only for the intrinsic functions NINT(), ANINT(),... and ifort follows the Fortran standard. So, NINT(1.5) produces 2, and NINT(2.5) produces 3, while
(1.5 + 1/EPSILON(1.5)) - 1/EPSILON(1.5)
and
(2.5 + 1/EPSILON(2.5)) - 1/EPSILON(2.5)
would both produce 2.0 (in SSE code, under /assume:protect_parens).
Incidentally, the IEEE standards specify that round to nearest even must be the default, and the mode chosen when rounding mode control is not implemented.
There have been compilers which supported a fast_nint option which would speed up the NINT() and ANINT() intrinsics by following the IEEE rather than the Fortran standard (prior to f2003 ieee_arithmetic). This would not be as disruptive as the ifort option /Qrcd which changes INT and AINT intrinsics (explicit or implicit) to IEEE banker's rounding, making them practical equivalents of NINT and ANINT.