11.0.072 Compiler Showing Significan Numerical Precision Differences

mattintelnetfort · ‎03-11-2009

Hi All

We're validating the results of upgrading the fortran compiler from 10.1.021 to 11.0.072 against our codebase, and seeing significant numerical differences in the area of Real*4 roundoff. Attached is a simple code clip with output. When the code clip is compiled with the older compiler, the Real*4 result is very similar to using Real*8. When we use new compiler we're seeing significant variation. The code clip was using the default compiler settings for both old and new compilers. Can anyone explain?

Our example has all implicit variables declared as real * 4 (note commented out implicit statement). A number of arithmetic operations are performed, including a number of variable assignments. The target result value is 1984. With the old compiler we see 1984.000 (Real*4), and 1983.999999946 (Real*8); with new compiler we see 1983.711 (Real*4) and 1983.999999946 (Real*8). We realize there is roundoff to consider, but the REAL*4 result with the new compiler seems worrisome and is causing problems (bigger differences/behavior downstream). Any thoughts?

CODE:

C***********************************************************************
C

C-----------------------------------------------------------------------
c IMPLICIT REAL*8 (A-H,O-Z)
DIMENSION Q(2,2),H(2,2),S(2)
C-----------------------------------------------------------------------
C
NS=2
NP=2
IS=1
IP=1
IQW=0
QW=0.
SX=15700.
HSX=1000.0
HSS=HSX
Q(1,1)=1896.0
Q(1,2)=1984.0
Q(2,1)=2036.0
Q(2,2)=2280.0
H(1,1)=20610.0
H(1,2)=22540.0
H(2,1)=20608.0
H(2,2)=22538.0
S(1)=15000.0
S(2)=15700.0
C
C

AS=(SX-S(IS))/(S(IS+1)-S(IS))
Q4=Q(IP,IS)+AS*(Q(IP,IS+1)-Q(IP,IS))
H4=H(IP,IS)+AS*(H(IP,IS+1)-H(IP,IS))
Q3=Q(IP+1,IS)+AS*(Q(IP+1,IS+1)-Q(IP+1,IS))
H3=H(IP+1,IS)+AS*(H(IP+1,IS+1)-H(IP+1,IS))
C
C
ISS=IS
710 GM=(H(1,ISS+1)-H(1,ISS))/(Q(1,ISS+1)-Q(1,ISS))
SM=(H4-H3)/(Q4-Q3)
QA=(GM*Q(1,ISS)+SM*(QW-Q3)+H3-H(1,ISS))/(GM-SM)
HA=H(1,ISS)+GM*(QA-Q(1,ISS))
QB=Q3+(HA-H3)/SM
c *** note: qb should be 1984.0
C in the old compiler (10.1.021) it was 1984.002
c in the new compiler (11.0.072) it is 1983.711
write(6,*)'cctest QB',QB
END

OUTPUT:

output from 10.1.021:

$ ./test
cctest QB 1984.000

output from 11.0.072:
$ ./test
cctest QB 1983.711

output from 10.1.021 using implicit real*8:
$ ./test
cctest QB 1983.99999999946

output from 11.0.072 using implicit real*8:

$ ./test
cctest QB 1983.99999999946

Thanks

Steven_L_Intel1 · ‎03-11-2009

11.0 defaults to using SSE instructions for FP - this can cause small numerical differences when compared to older versions that defaulted to using the X87 instructions. Some applications are sensitive to this. Try compiling with /arch:ia32 to see if that helps. This is called out in the release notes.

I verified that compiling with /arch:ia32 gives the answer you want. You were getting more precision than you declared in version 10.1.

andrew_daniels · ‎10-20-2010

I have been using the /arch:ia32 compiler opition to provide better precision forour 32 bit codes. I have recently compiled the code with the 64 bit Fortran compiler (11.1.065). I was unable to find the ia32 option for the 64 bit compiler. Will you please let me know how to get the same precision arithmetic on the 64 bit compiler as I am getting with the 32 bit compiler? I am getting different answers between the 32 bit and the 64 bit compiled codes. The answers are different even if the 32 bit code was compiled without the /arch:ia32 option.

Thanks,

Andrew

andrew_daniels · ‎10-20-2010

I have another question about the precision of arithmetic.
Does Intel Fortran do data type promotion? Given two Real*4 numbers, A and B, and a Real*8 result, C, If I do the following calculations I get two different results. I would expect that if Fortran did data type promotion I should get the same answers for the following calculations:

C = A*1.D0/B
C = DBLE(A)/DBLE(B)

thanks,

Andrew Daniels

TimP · ‎10-20-2010

Due to the basic design of the X64 OS, there isn't a satisfactory means for getting automatic promotion to double by use of "ia32" (x87) instructions. The initial versions of Windows X64 forced the compilers to remove all means for generating x87.
If you can determine where in your code you depend on promotion to double, you should write in the promotion explicitly, so you don't depend on x87 characteristics. In case that ruins performance, or as a means for finding out where you depend on double, there are options such as /real-size:64.

Steven_L_Intel1 · ‎10-20-2010

The Fortran language has explicit rules for what happens in an expression when two operands are of different type of kind. Intel Fortran follows this. In the case of two REAL values of different kind, the lesser kind is promoted to the greater kind and then the operation performed. The type and kind of the variable being assigned to is not involved until the right side is completely evaluated - only then is any conversion done to the type and kind of the left side.

I would expect that the two assignments you show should give the same result. I could not comment further unless you showed an actual executable example, rather than a theoretical paraphrase.

In the 11.x compilers, x87 instructions are not used in either 32-bit or 64-bit mode, so that potential difference should be eliminated.

andrew_daniels · ‎10-21-2010

thank you for your response!

andrew_daniels · ‎10-21-2010

Ok,

Here is the specific code I used to show the problem with data type promotion:

subroutine precis()

real*4 r41,r42,r4arr(2)

real*8 r81, r82,r8sing,r8doubl,r8arry

data r41,r42/873.2746582031,836.3831176758/

data r4arr/873.2746582031,836.3831176758/

r8sing=2.D0/3.D0*(r41+r42-r41*r42/(r41+r42))

r8doubl=2.D0/3.D0*(dble(r41)+dble(r42)-dble(r41)*dble(r42)/

1 (dble(r41)+dble(r42)))

r8arry=2.D0/3.D0*(r4arr(1)+r4arr(2)-r4arr(1)*r4arr(2)/

1 (r4arr(1)+r4arr(2)))

OPEN(104,ACCESS='SEQUENTIAL',STATUS='UNKNOWN',FILE='C:\OUT.TXT')

WRITE(104,"('single precision',g20.13,

1 ' double precision',g20.13,

2 ' array single',g20.13)") r8sing,r8doubl,r8arry

CLOSE(104)

RETURN

END

Some of the formatting is incorrect because of the behavior of the copy and paste, but you should be able to indent as needed.

The code was compiled as 32 bit using the following options:

/nologo /Ob0 /Oy- /assume:buffered_io /I"Release/" /I"\ProductDevSource\StCommon\inc" /I"\ProductDevSource\StGCommon\EngineDLL\Inc" /Qauto /align:dcommons /align:sequence /assume:byterecl /fpconstant /iface:cvf /module:"Release/" /object:"\productdev\SynerGEE\Gas\Unicode\Release\Engine_swsclc/" /libs:dll /threads /libdir:nouser /c

The code was compiled as 64 bit using the following options:

/nologo /Ob0 /assume:buffered_io /I"Release/" /I"\ProductDevSource\StCommon\inc" /I"\ProductDevSource\StGCommon\EngineDLL\Inc" /Qauto /align:dcommons /align:sequence /assume:byterecl /fpconstant /iface:cvf /module:"Release/" /object:"C:\productdev\SynerGEE\Gas\Unicode\Release\x64\Engine_swsclc/" /libs:dll /threads /libdir:nouser /c

I am using the 11.1.065 compiler

The 32 bit outputis:

single precision 854.9615071615 double precision 854.9615641765 array single 854.9615071615

The 64 bit output is:

single precision 854.9615071615 double precision 854.9615641765 array single 854.9615071615

Notice that the numbers should all be the same but they are not.

If you can let me know what I am doing wrong I would appreciate it.

Thanks,

Andrew Daniels

TimP · ‎10-21-2010

I don't see the option -real-size:64 in your command line; anyway, it applies only to default real. If you change your real*4 to real, it does what you seem to be saying you intend.

Steven_L_Intel1 · ‎10-21-2010

I think he means x64, not making REAL 64-bit. I'll take a look at this as soon as I can.

TimP · ‎10-21-2010

My understanding is he wants all expressions evaluated in double precision (even though the example is based on single precision constants), an effect which is produced by the /arch:ia32 option in the 32-bit compiler.
I had already mentioned -real-size:64 as an option to produce that effect for intel64, if one chooses to do it without writing the desired promotions into the source code.

Steven_L_Intel1 · ‎10-21-2010

Tim, unless I am missing something, he explicitly uses double-precision form for all constants intended to be double precision. Do you spot a case where that is not true?

andrew_daniels · ‎10-21-2010

Yes, my original question was how to reproduce the real*8 arithmetic using real*4 constants.Thank you for theanswer to that question. However, I also would like to know why the following arithmetic is not equivalent since I believe that data types should be promoted to the highest data types in the expression:

r8 = some function of 1.d0 * r4 constants
r8 =the samefunction of dble(r4) constants

Please see my earlier post for more information.

Thanks,

Andrew

TimP · ‎10-21-2010

No, Fortran standard (since 40 years ago) specifies evaluation of individual parts of an expression according to the higher precision of the operands of each operation, so promotion doesn't occur until a higher precision is declared. The expectation of promotion of all single precision operations to double was never widespread enough to require ifort to have an option such as ICL /fp:double. Even in C, that capability is needed only for code written over 20 years ago, before there was an agreed standard.
Even though real*4 isn't covered by Fortran standard, it's taken to indicate that all operations between operands of that type are in that precision, even in the presence of options like -real-size:64.

shenkj · ‎10-21-2010

Hi. I am a co worker of Andrew's.

We are trying to understand why theassignment in the code snippet from the previous code post:

subroutine precis()

real*4 r41,r42,r4arr(2)

real*8 r81, r82,r8sing,r8doubl,r8arry

data r41,r42/873.2746582031,836.3831176758/

data r4arr/873.2746582031,836.3831176758/

r8sing=2.D0/3.D0*(r41+r42-r41*r42/(r41+r42)) ! <<<-------------assignment

Does not do the arithmetic in real*8. The variable is real*8 and the operands are a mix of real*8 and real*4.

Are you saying that the r4 variables in the assignment are computed in real*4 precision, the result is placed in real*8, then the 2/3 * the result of the real*4s is computed as real*8?

That would explain what we are seeing...

Steven_L_Intel1 · ‎10-21-2010

Yes - the addition of two R4s is done in R4. Conversion to R8 is done only when the multiply by 2.D0/3.D0 happens. (Actually, the multiply might happen before the divide.) The type promotion is done on an operation by operation basis, not globally through the whole expression.

andrew_daniels · ‎10-25-2010

Thanks for your help. That explains the behavior I have been seeing.