Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29235 Discussions

Numerical differences between versions 10 and 12

van_der_merwe__ben
New Contributor II
744 Views
When we upgraded fron Intel Fortran 9 to 10, we saw no numerical differences at all. But when we upgraded from 10 to 12, we saw some numerical differences. We compile pretty much using the default options, Windows 32-bit, commercial client.

In our code we have this expression: XY(I,J)=XY(I,J)-Q*S(J)
and the variable values are (copied from the Visual Studio 6 and 2010 debugger, both show the same values):

I=3
J=3
XY(I,J) 6698.5491675901200000000000000000 REAL(8)
S(J) 141.7567533613250000000000000000 REAL(8)
Q 47.2522511204418000000000000000 REAL(8)

Visual Studio 2005 / Intel 10 gives the new value as:

0.2234597422302610000000000000

Visual Studio 2010 / Intel 12 gives the new value as:

0.2234597422293520000000000000

The input values shown in the first table are the exact same in the two debuggers. The new values only agree to about the tenth digit (I would expect more). Now in most cases the two version give the exact same results, but for some numbers just once in a while, they do not.

Just FYI: Visual Studio 2010 / Intel 10 gives the exact same numbers as Visual Studio 2005 / Intel 10.

The Intel 10 code uses IA-32 instructions it seems (fld, fld, fmulp, fld, fsubrp, fstp assembly instructions)

The Intel 12 code uses SEE instructions it seems (movsd xmm1,mulsd,movsd,subsd,movsd). Since the debugger does not show the SSE registers as floating point, it makes it hard to follow and compare.

Using the /arch:IA32 option with VS2010 / Intel 12 still gives the exact same result as without this option (even though the assembly instructions do change somewhat to be closer to what Intel 10 gave).

Why the difference? And it only happens in some cases! Other evidence leads us to suspect that there is some FPU flag or FPU memory or portion of the past that is somehow impacting the results. Is it possible that the FPU or SEE instructions are maybe 64-bit but they do not bother clearing when loading 8 byte reals and some leftover from a previous result can have an impact in some situations?

How do the compile options (e.g. /Op) impact the above? Are there any compile options that can help with this?

The exact number should ideally be: 0.223459742258870430966615. Interestingly Visual Studio shows a single precision flyby result in the debugger.

Now I know that the last few digits around 15-16 are subject to rounoff etc. and yes, a well formulated algorithm and code should never be notably impacted by such differences. But sadly we have tons of old legacy code where we have to hunt down some larger differences which seem to be originating from smaller differences shown above.

And there is a clear indication that past numerical calculations impact this, as if some flag or portion of leftover value in the CPU/FPU somehow impacts future calculations.

Any thoughts or comments are suggestions are most welcome.
XY(I,J) 6698.5491675901200000000000000000 REAL(8)
0 Kudos
4 Replies
Steven_L_Intel1
Employee
744 Views
We would need to see an actual test case in order to comment further. I will say that optimization may change the order of operations and this can cause small differences. It may also be that you have some single-precision constants or variables in the mix. Just seeing a code snippet or paraphrase doesn't help us help you.
0 Kudos
van_der_merwe__ben
New Contributor II
744 Views
I shall work with them towards making something small that reproduces it, though it may tell us the issue is elsewhere.

So how does possibly mixing single precisions constants or variables with double precision impact this? Is that a bad thing? Why?
0 Kudos
mecej4
Honored Contributor III
744 Views
The new values only agree to about the tenth digit (I would expect more)

You seem to have overlooked a rather simple and significant explanation: subtraction of nearly equal numbers causes loss of precision.

The product of Q and S(J) (use any calculator) is 6698.3257...; if you subtract this from XY(I,J), which has the value 6698.549..., you lose four decimal digits of precision (or 13 bits in the significand) immediately. If the variables are loaded from memory, which is 8 bytes for DOUBLE and 4 bytes for REAL, whether you use 80-bit X87 intermediate results or 64-bit SSE2 instructions you cannot avoid the effect of having lost 4 to 5 decimal digits.
0 Kudos
van_der_merwe__ben
New Contributor II
744 Views
I sat down with the person hunting this down, and we set up a separate project with the relevant code (so we could send it to Intel if there are differences), passing in the input floating point values we saw in the debugger. Intel 10 and 12 gave the exact same answers, so the IA-32 and SSE instructions (same Fortran code, both compilers generated the same assembly as before), give the exact same numbers.

So what gives?

I then used assembler to look at the numbers being passed in, the actual arrays, looking at them as character arrays (cast to unisgned char, 24 bytes for 3 doubles). And one of the numbers has one byte that is different. However, in both cases the debuggers shows the exact same input real values (despite the fact that they are clearly not exactly identical). Changing that one byte also confirms this, as the results then match.

Some one of the input numbers is slightly different, though both debuggers show the exact same input real values (when observered as reals), but due to the sensitivity of the calculations (exponential and high order polynomials), that minor input difference then results in an observable output difference.

So we are now hunting down why that one byte in one number is different. *grins* Mostly because we want to know and because we are concerned about what in our code would be causing this.
0 Kudos
Reply