- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When we upgraded fron Intel Fortran 9 to 10, we saw no numerical differences at all. But when we upgraded from 10 to 12, we saw some numerical differences. We compile pretty much using the default options, Windows 32-bit, commercial client.
In our code we have this expression: XY(I,J)=XY(I,J)-Q*S(J)
and the variable values are (copied from the Visual Studio 6 and 2010 debugger, both show the same values):
I=3
J=3
XY(I,J) 6698.5491675901200000000000000000 REAL(8)
S(J) 141.7567533613250000000000000000 REAL(8)
Q 47.2522511204418000000000000000 REAL(8)
Visual Studio 2005 / Intel 10 gives the new value as:
0.2234597422302610000000000000
Visual Studio 2010 / Intel 12 gives the new value as:
0.2234597422293520000000000000
The input values shown in the first table are the exact same in the two debuggers. The new values only agree to about the tenth digit (I would expect more). Now in most cases the two version give the exact same results, but for some numbers just once in a while, they do not.
Just FYI: Visual Studio 2010 / Intel 10 gives the exact same numbers as Visual Studio 2005 / Intel 10.
The Intel 10 code uses IA-32 instructions it seems (fld, fld, fmulp, fld, fsubrp, fstp assembly instructions)
The Intel 12 code uses SEE instructions it seems (movsd xmm1,mulsd,movsd,subsd,movsd). Since the debugger does not show the SSE registers as floating point, it makes it hard to follow and compare.
Using the /arch:IA32 option with VS2010 / Intel 12 still gives the exact same result as without this option (even though the assembly instructions do change somewhat to be closer to what Intel 10 gave).
Why the difference? And it only happens in some cases! Other evidence leads us to suspect that there is some FPU flag or FPU memory or portion of the past that is somehow impacting the results. Is it possible that the FPU or SEE instructions are maybe 64-bit but they do not bother clearing when loading 8 byte reals and some leftover from a previous result can have an impact in some situations?
How do the compile options (e.g. /Op) impact the above? Are there any compile options that can help with this?
The exact number should ideally be: 0.223459742258870430966615. Interestingly Visual Studio shows a single precision flyby result in the debugger.
Now I know that the last few digits around 15-16 are subject to rounoff etc. and yes, a well formulated algorithm and code should never be notably impacted by such differences. But sadly we have tons of old legacy code where we have to hunt down some larger differences which seem to be originating from smaller differences shown above.
And there is a clear indication that past numerical calculations impact this, as if some flag or portion of leftover value in the CPU/FPU somehow impacts future calculations.
Any thoughts or comments are suggestions are most welcome.
In our code we have this expression: XY(I,J)=XY(I,J)-Q*S(J)
and the variable values are (copied from the Visual Studio 6 and 2010 debugger, both show the same values):
I=3
J=3
XY(I,J) 6698.5491675901200000000000000000 REAL(8)
S(J) 141.7567533613250000000000000000 REAL(8)
Q 47.2522511204418000000000000000 REAL(8)
Visual Studio 2005 / Intel 10 gives the new value as:
0.2234597422302610000000000000
Visual Studio 2010 / Intel 12 gives the new value as:
0.2234597422293520000000000000
The input values shown in the first table are the exact same in the two debuggers. The new values only agree to about the tenth digit (I would expect more). Now in most cases the two version give the exact same results, but for some numbers just once in a while, they do not.
Just FYI: Visual Studio 2010 / Intel 10 gives the exact same numbers as Visual Studio 2005 / Intel 10.
The Intel 10 code uses IA-32 instructions it seems (fld, fld, fmulp, fld, fsubrp, fstp assembly instructions)
The Intel 12 code uses SEE instructions it seems (movsd xmm1,mulsd,movsd,subsd,movsd). Since the debugger does not show the SSE registers as floating point, it makes it hard to follow and compare.
Using the /arch:IA32 option with VS2010 / Intel 12 still gives the exact same result as without this option (even though the assembly instructions do change somewhat to be closer to what Intel 10 gave).
Why the difference? And it only happens in some cases! Other evidence leads us to suspect that there is some FPU flag or FPU memory or portion of the past that is somehow impacting the results. Is it possible that the FPU or SEE instructions are maybe 64-bit but they do not bother clearing when loading 8 byte reals and some leftover from a previous result can have an impact in some situations?
How do the compile options (e.g. /Op) impact the above? Are there any compile options that can help with this?
The exact number should ideally be: 0.223459742258870430966615. Interestingly Visual Studio shows a single precision flyby result in the debugger.
Now I know that the last few digits around 15-16 are subject to rounoff etc. and yes, a well formulated algorithm and code should never be notably impacted by such differences. But sadly we have tons of old legacy code where we have to hunt down some larger differences which seem to be originating from smaller differences shown above.
And there is a clear indication that past numerical calculations impact this, as if some flag or portion of leftover value in the CPU/FPU somehow impacts future calculations.
Any thoughts or comments are suggestions are most welcome.
XY(I,J) | 6698.5491675901200000000000000000 | REAL(8) |
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We would need to see an actual test case in order to comment further. I will say that optimization may change the order of operations and this can cause small differences. It may also be that you have some single-precision constants or variables in the mix. Just seeing a code snippet or paraphrase doesn't help us help you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I shall work with them towards making something small that reproduces it, though it may tell us the issue is elsewhere.
So how does possibly mixing single precisions constants or variables with double precision impact this? Is that a bad thing? Why?
So how does possibly mixing single precisions constants or variables with double precision impact this? Is that a bad thing? Why?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The new values only agree to about the tenth digit (I would expect more)
You seem to have overlooked a rather simple and significant explanation: subtraction of nearly equal numbers causes loss of precision.
The product of Q and S(J) (use any calculator) is 6698.3257...; if you subtract this from XY(I,J), which has the value 6698.549..., you lose four decimal digits of precision (or 13 bits in the significand) immediately. If the variables are loaded from memory, which is 8 bytes for DOUBLE and 4 bytes for REAL, whether you use 80-bit X87 intermediate results or 64-bit SSE2 instructions you cannot avoid the effect of having lost 4 to 5 decimal digits.
You seem to have overlooked a rather simple and significant explanation: subtraction of nearly equal numbers causes loss of precision.
The product of Q and S(J) (use any calculator) is 6698.3257...; if you subtract this from XY(I,J), which has the value 6698.549..., you lose four decimal digits of precision (or 13 bits in the significand) immediately. If the variables are loaded from memory, which is 8 bytes for DOUBLE and 4 bytes for REAL, whether you use 80-bit X87 intermediate results or 64-bit SSE2 instructions you cannot avoid the effect of having lost 4 to 5 decimal digits.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I sat down with the person hunting this down, and we set up a separate project with the relevant code (so we could send it to Intel if there are differences), passing in the input floating point values we saw in the debugger. Intel 10 and 12 gave the exact same answers, so the IA-32 and SSE instructions (same Fortran code, both compilers generated the same assembly as before), give the exact same numbers.
So what gives?
I then used assembler to look at the numbers being passed in, the actual arrays, looking at them as character arrays (cast to unisgned char, 24 bytes for 3 doubles). And one of the numbers has one byte that is different. However, in both cases the debuggers shows the exact same input real values (despite the fact that they are clearly not exactly identical). Changing that one byte also confirms this, as the results then match.
Some one of the input numbers is slightly different, though both debuggers show the exact same input real values (when observered as reals), but due to the sensitivity of the calculations (exponential and high order polynomials), that minor input difference then results in an observable output difference.
So we are now hunting down why that one byte in one number is different. *grins* Mostly because we want to know and because we are concerned about what in our code would be causing this.
So what gives?
I then used assembler to look at the numbers being passed in, the actual arrays, looking at them as character arrays (cast to unisgned char, 24 bytes for 3 doubles). And one of the numbers has one byte that is different. However, in both cases the debuggers shows the exact same input real values (despite the fact that they are clearly not exactly identical). Changing that one byte also confirms this, as the results then match.
Some one of the input numbers is slightly different, though both debuggers show the exact same input real values (when observered as reals), but due to the sensitivity of the calculations (exponential and high order polynomials), that minor input difference then results in an observable output difference.
So we are now hunting down why that one byte in one number is different. *grins* Mostly because we want to know and because we are concerned about what in our code would be causing this.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page