Smith's algorithm unexpectedly accurate...

baudinm · ‎01-04-2010

Hi,

I analyze various Scilab versions against a particular complex
division and Intel Fortran. We use Smith's algorithm. I observe that,
depending on the value of the /arch:IA32 option, I get different results.
* In debug or release mode, without the /arch:IA32 option,
I get an approximated result, as expected by Smith's method.
* In release mode, with the /arch:IA32 option, I get
an exact result, which is both more accurate and unexpected
from the algorithm.
I would like to understand these changes in Scilab results.
Fortunately, I was able to reproduce the problem with a much simpler
fortran 77 source code (see in attachment).

Consider the following complex division :
(1.e307 + 1.e-307 * i)/(1.e205 + 1.e-205 * i)
which result is
1.000+102 - 1.000-308i
which is exact to the displayed number of digits.
For this particular input data, Smith's algorithm fails
with double precision floating point numbers
and produces the result
1.000+102
that is, the imaginary part is subject to an underflow.

When I compile the code in debug mode and execute
it, I get :
Smith: 0.1000000000E+103 +I* 0.0000000000E+000
which is the expected result.
If I compile the code in release mode and execute it,
I get the same result.
If I compile the code in release mode, add the /arch:IA32
option and execute it, I get :
Smith: 0.1000000000E+103 +I* -0.1000000000E-307

My guess is that the optimized source code uses extended
floating point precision variables with 80 bits instead of the
regular 64 bits. Am I right ?

The problem is that the /arch:IA32 is not meant to add
extra precision, but only to ensure some compatibility
for older machines. Is this is a bug in the compiling options
I chosed ? Should I add another option ?

I use Intel Fortran Compiler Package ID: w_cprof_p_11.0.061,
11.0.3451.2008.

Best regards,

Michal Baudin

PS
This is bug #5739 in Scilab bug database :
http://bugzilla.scilab.org/show_bug.cgi?id=5739

TimP · ‎01-04-2010

/arch:IA32 inherently extends "80-bit" extended range further across the evaluation of expressions. If you want to always truncate range the same as /Od without using SSE2 code, you must use /Od. If you want to work reliably close to the overflow limit, you may have to consider setting /fp:source, not that it directly addresses your criticism.

baudinm · ‎01-05-2010

Quoting - tim18

/arch:IA32 inherently extends "80-bit" extended range further across the evaluation of expressions.

Thank you for that information.
In the "Intel Visual Fortran Compiler User and Reference Guides", what I can read in the "arch" option is :
"IA32
Generates code that will run on any Pentium or later processor. Disables any default extended instruction settings, and any previously set extended instruction settings. This value is only available on Linux and Windows systems using IA-32 architecture."
There is no mention for the use of an extended range of values for floating point numbesr in the evaluation of expressions in this.
In what Intel document can I find more information on this topic ?

TimP · ‎01-05-2010

The relevant points aren't concisely documented, and the wording you quote is somewhat unfortunate. At
http://www.intel.com/products/processor/manuals/ you could read all the details about x87 (and SSE) instructions.
The "extended instruction" settings referred to are the generation of SSE and later instructions. This leaves only the x87 instructions which have a 15-bit extended exponent. Nice contradiction.
In part, the behavior is determined by Microsoft, rather than Intel. As Visual Studio requires the 53-bit precision setting by default, actually only 68 bits of the 80-bit register are in use (/Qpc64 mode). It's called pc64 rather than pc68, because the extended precision is suppressed when a temporary (as well as permanent) store occurs, which may happen more often in debug mode.
If you set the option /Qpc80, the 64-bit precision is set, and the entire 80-bit format is in use. To the extent that is documented, it's in the Fortran help file. It's contradictory to the Microsoft library design.
As you can see, there is a degree of expectation that you know some history when you use /arch:IA32, and the primary reason for preserving the option is to support CPUs without full SSE2 instruction set, none of which have been produced since 7 years ago. Ever since introduction of Pentium 4, Intel and Microsoft have been moving away from keeping it as the primary compilation mode, due in part to the numerical quirks not being entirely on the positive side. Most C and C++ compilers, other than those for Visual Studio, support the 80-bit format directly by the long double data type, but few Fortran compilers ever supported such a data type, e.g. real(10).