- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
Some time ago I converted a large numerical codebase over to the Intel Visual Fortran compiler. So far we have been unable to make the switch to IVF as we are seeing an unacceptably large discrepancy between the new simulation results and those produced by our old compiler. After some work I have managed to identify the source of problem - the deviation seems to be seeded by an error in the last digit of a floating point calculation in the IVF build:
x = 1.0000000/8.5404000
the correct answer is:
x = 0.11709053
the IVF answer is:
x = 0.11709054
I suspect this is due to some kind of compiler optimisation - I have tried /fp:precise and /fp:strict with no success. I have attached a small project which replicates the issue. Could someone please tell me if/how I can resolve this issue? Any guidance would be appreciated...
Thanks
Build 20101201 Package ID: w_cprof_p_11.1.072
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Looks like rounding issue.Are those values i.e x and 1.0000000/8.5404000 declared as a double precision?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Maybe FP environment is set to default mode, thus implying rounding to the nearest.On the other hand you have set /fp:strict.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you comparing results with implicit promotion to double vs. results without? You make it too much of a struggle to figure out what you are doing.
If you want ia32 mode, implying implicit promotion to double, and no auto-vectorization, with win32 compilation you would set /arch:IA32
but then you could not expect bit-wise agreement with results in default or intel64 modes unless you made promotion to double explicit e.g.
1.0/(dble(HCO) + HRO)
(but you may need to read from data file directly to double precision)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
David Mccabe wrote:
I suspect this is due to some kind of compiler optimisation
No, it is something more basic at work. The differences between the "correct" and "incorrect" answers is too small to represent with default REAL precision. You can only expect about 7 decimal digits to agree. Try the following program, and before running it predict what it will print.
[fortran]program fptest
real x,rx
x=8.5404000
rx=1.0/x
if(abs(x*rx - 1.0).lt.epsilon(x))then
write(*,*)'Correct'
else
write(*,*)'Error'
endif
end program fptest[/fortran]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mecej4 wrote:
Quote:
David Mccabe wrote:I suspect this is due to some kind of compiler optimisationNo, it is something more basic at work. The differences between the "correct" and "incorrect" answers is too small to represent with default REAL precision. You can only expect about 7 decimal digits to agree.
Mecej4 is correct. Single precision (SP) has only around 6.92 decimal digits of precision. Double precision (DP) has 15. You are asking for at least 8 decimal digits of precision in your results. That is too much precision for SP.
Even if the variable that you are assigning the result to is defined to be DP, the Fortran standard requires that the arithmetic on the constants be performed using SP. That is because you are using default real constants, since you do not have any kind type parameter at the end of the constants. By rule, default real is SP.
This may be old, "dusty deck" source code, from the days of Fortran 77 or before. The Fortran 77 standard allowed a Fortran compiler to "promote" a constant from SP to DP if the result was being assigned to a DP variable. Starting with Fortran 90, published in June 1991, this practice of "promoting" SP constants to DP is prohibited. See Fortran 2008 standard, section 1.6.5, first bullet. By rule, all arithmetic is performed using the data type and kind of the operands. In this example, since both operands are SP, the arithmetic is SP and the result is SP.
It appears that you desire a fairly high degree of accuracy in your results. I strongly recommend that you us DP or greater for all of your floating-point work, both variables and constants. If you do, I believe that it is highly likely that you will get the results that you desire.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think the Fortran standard issue to which Craig refers is that literal floating point constants with no type specification are to be interpreted as default real. ifort has an option to change that (in conflict with the standard). Your comment misled us into raising that issue which isn't supported as one related to your code sample.
More than one of us also raised the issue that you can't expect identical results on a target where promotion to double precision occurs (after translation of constants to default real) as on a target where pure default real is supported (as ifort has done by default for several years now). Fortran standard doesn't prevent such differences, nor does it disallow differences due to differing orders of evaluation (in the absence of parentheses, for which ifort requires /assume:protect_parens or a /fp: option for standard compliance). Such differences, as others pointed out, easily account for changes on the order of the inherent precision of default real.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The following might clarify the difference
real*4 x4, x6
real*8 x8, x
x = 1.0000000/8.5404000
x4 = 1.0000000/8.5404000
x6 = 1.00000d0/8.54040d0
x8 = 1.00000d0/8.54040d0
! the correct answer is: x = 0.11709053
! the IVF answer is: x = 0.11709054
write (*,*) 'exp x = 0.11709053'
write (*,*) 'IVF x = 0.11709054'
write (*,1) 'x4 =',x4
write (*,1) 'x6 =',x6
write (*,2) 'x8 =',x8
write (*,2) 'x =',x
1 format (1x,a,f12.9)
2 format (1x,a,f17.14)
end
exp x = 0.11709053
IVF x = 0.11709054
x4 = 0.117090538
x6 = 0.117090531
x8 = 0.11709053440120
x = 0.11709053814411
Does this says something about the stability or reliability of your simulation ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the explanations... just to clarify - I have no intention to promote anything to double precision, I only consider sp.
It is now clear to me that the correct mathematical value from the expression cannot be represented exactly using the IEEE binary representation. Nevertheless it would be interesting to know why the result varies between the two compilers.
Both functions are fed the same bit patterns:
HRO: 01000000101100010100101011110101
HCO: 01000000010000000000000000000000
Yet they return:
ifort: 00111101111011111100110100101010 (0.117090538144112)
ftn95: 00111101111011111100110100101001 (0.1170790530693531)
I guess the cause of this is the ifort compiler first calculating the denominator (HRO+HCO) which is stored in a in a 4 byte buffer, then the division is performed. In the case of ftn95 the whole calculation is performed at greater precision.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I ran the example code in #8 with ftn95 and it produced an identical result to IVF. The only calculations that produce what you want are when the constant 8.5404000 is treated as 8.54040d0. Did you run it with ftn77, as ftn95 is treating it as 8.54040e0 ?
The only way I can get ftn95 to produce what you had is to use /DReal or /Xreal
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@John Campbell: thanks, ftn95 produces the same result as yours using single precision throughout - the reason being that there is no addition in your denominator. The discrepancy in my example is caused by ftn95 performing a "fused multiply–add" (or fused divide add in this case) whereas ifort rounds the result of the addition before performing the division. Could it be that I need a compiler switch which forces this behavior in the ifort compiler?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are at the limits of the precision chosen. Why worry about it? It is just a function of rounding/precision in the low level code? If this is something to worry about you need to be doing DP maths.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
David Mccabe wrote:
@John Campbell: thanks, ftn95 produces the same result as yours using single precision throughout - the reason being that there is no addition in your denominator. The discrepancy in my example is caused by ftn95 performing a "fused multiply–add" (or fused divide add in this case) whereas ifort rounds the result of the addition before performing the division. Could it be that I need a compiler switch which forces this behavior in the ifort compiler?
Your argument is getting more circular. You complain that ifort, when restricted to single precision, doesn't give the same result as you get with double precision evaluation, but say you don't want the promotion of expression evaluation to double. You tried options which imply /Qprec-div and now say you want some other option with similar effect? By default, ifort is allowed to use an iterative method for division, although this wouldn't apply to the examples you gave here (one with addition in the divisor, one without).
If you're reading about Itanium division expression evaluation methods, forget it, those are unique to Itanium.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
David Mccabe wrote:
The discrepancy in my example is caused by ftn95 performing a "fused multiply–add" (or fused divide add in this case) whereas ifort rounds the result of the addition before performing the division. Could it be that I need a compiler switch which forces this behavior in the ifort compiler?
Mainstream Intel processors preceding Haswell/Broadwell did not have FMA instructions (it is a different story with PowerPC), so your inference is wrong, unless one were to interpret "fused" as "with no memory accesses for intermediate results".
The differences that you are seeing are caused by the differences between 80X87 FPU (80 bit register stack) and XMM (32/64/128/.. bit register) architecture. Nor do you need to compare two different compilers to observe such behavior. With the current Intel compiler for 32-bit targets, with the option /arch:ia32 the result is 1.1709053E-01 (0X3DEFCD29 in Hex Float format) for Res, whereas with the default (SSE2) options the result is 1.1709054E-01 (3DEFCD2A in Hex).
The difference is only in the least significant bit. I think that you should respect app4619's advice. Please read Tim Prince's comments, as well. He is an expert on these topics.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
David Mccabe wrote:
we are seeing an unacceptably large discrepancy between the new simulation results and those produced by our old compiler.
the correct answer is:
x = 0.11709053
the IVF answer is:
x = 0.11709054
David,
I would suggest that if differences at the last decimal place in single precision are causing "large discrepancies" then the algorithm is suspect and/or should be changed to a higher precision.
The "correct" answer above is in fact only correct for a particular definition of correct and is truncated from (my calculator value of) 0.11709053440119900707226827783242
The compilers/processors would store this result as single precision to the nearest bit depending on various factors (hardware registers, rounding options etc.)
Les
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks all for the help, this has been a learning experience for me. @Tim Prince,mecej4 you are totally correct...
"FPU (x87) instructions provide higher precision by calculating intermediate results with 80 bits of precision, by default, to minimise roundoff error in numerically unstable algorithms (see IEEE 754 design rationale and references therein). However, the x87 FPU is a scalar unit only whereas SSE2 can process a small vector of operands in parallel."
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You will need to revise your code should you desire to migrate from x87 to SSE/AVX. Sometimes this is harder than simply changing a compiler switch. Often convergence code which were formerly tweaked to the limits of x87 will not work when using SIMD instructions. You may need a larger epsilon. Then results will differ, and if you have a requirement to match a "certified" set of results then you will not be able to migrate to the newer technology.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It would be useful if ifort provided real*10 to assist in replicating 80-bit computation. I presume this could be done with x87 instructions in i32. Would it be done as a software implementation in i64, or is there access to x87 instructions ?
The down side of providing this kind is all those who use QP = Select_Real_Kind (P=16,R=308) for real(16)
would have to change to QP = Select_Real_Kind (P=19).
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
John, we're not going to do that. We did seriously discuss it at one point and realized that it would create too many issues in the compiler for very little gain. We'd like to move away from x87 as much as possible.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page