Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29242 Discusiones

The different results: where are mistakes?

atru
Principiante
974 Vistas

The different results: where are mistakes?

I tried to use a program and run it under DOS (Microsoft Fortran Compiler) and Windows XP (Intel Fortran Compiler). In both cases the program code is the same, but the results are different. I would like to say that the first result is believed to be a true, the second result looks like as an incorrect. Can you please give me a hint or try to explain what was done wrong?

The program code:

SUBROUTINE TRANSCOE(FM,X,Y,Z,VX,VY,VZ,E)

REAL*8 FM,X,Y,Z,VX,VY,VZ,E,R,C1,C2,C3,C,CC,CF,F1,F2,F3,F,FMR

R=DSQRT(X**2+Y**2+Z**2)

C1=Y*VZ-Z*VY

C2=Z*VX-X*VZ

C3=X*VY-Y*VX

CF=C1**2+C2**2

C=DSQRT(CF+C3**2)

CC=DSQRT(CF)

FMR=FM/R

F1=C3*VY-C2*VZ-FMR*X

F2=C1*VZ-C3*VX-FMR*Y

F3=C2*VX-C1*VY-FMR*Z

F=DSQRT(F1**2+F2**2+F3**2)

E=F/FM

OPEN(unit=30,status='new',file='result.doc')

WRITE(30,'(D24.16)') X

WRITE(30,'(D24.16)') Y

WRITE(30,'(D24.16)') Z

WRITE(30,'(D24.16)') VX

WRITE(30,'(D24.16)') VY

WRITE(30,'(D24.16)') VZ

WRITE(30,'(D24.16)') C1

WRITE(30,'(D24.16)') C2

WRITE(30,'(D24.16)') C3

WRITE(30,'(D24.16)') CF

WRITE(30,'(D24.16)') F1

WRITE(30,'(D24.16)') F2

WRITE(30,'(D24.16)') F3

WRITE(30,'(D24.16)') F

WRITE(30,'(D24.16)') E

...

The results:

Microsoft Fortran v.5.0

Microsoft DOS v.6.22

1st test PC Pentium III 600

2nd test PC Intel 386

Intel Fortran Compiler 10.1 IA-32

Microsoft Windows XP 32-bit

3rd test PC Pentium III 1000

X

Y

Z

VX

VY

VZ

C1

C2

C3

CF

F1

F2

F3

F

E

.7269483740117034D+07

.2447154778078442D+08

.1782676603951549D+05

-.1595440725987366D+04

.4722973338563841D+03

.3582462403310649D+04

.8765998034140860D+11

-.2607109373899945D+11

.4247626174637883D+11

.8363974082203839D+22

-.4530666740771370D+11

-.2908824661924614D+12

-.8503673158452806D+11

.3064254378611226D+12

.7687533858746432D-03

0.7269483740117034D+07

0.2447154778078442D+08

0.1782676603951549D+05

-0.1595440725987366D+04

0.4722973338563841D+03

0.3582462403310649D+04

0.8765998034140858D+11

-0.2607109373899944D+11

0.4247626174637882D+11

0.8363974082203835D+22

-0.4530666740773438D+11

-0.2908824661926250D+12

-0.8503673158453357D+11

0.3064254378612825D+12

0.7687533858750443D-03

0 kudos
16 Respuestas
bmchenry
Nuevo Colaborador II
974 Vistas
how do you characterize 'wrong?'
different? yes
but how do you determine what is the correct answer? the one you used to get with the old compiler?
word size, sotrage of intermediate results, opearting system....all may cause slight differences.

folks will probably think i'm starting to sound like a broken record, but try:
/Qimf-arch-consistency:true

TimP
Colaborador Distinguido III
974 Vistas
I don't think arch-consistency was available in ifort 10.1. /assume:protect_parens /Qprec-div /Qprec-sqrt should have been available. There's really no way we can know when the old compiler may have used extended precision, or whether you want to use it as much as possible by generating x87 code with /Qpc80, if you chose the 32-bit compiler.
jimdempseyatthecove
Colaborador Distinguido III
974 Vistas
>> or whether you want to use it as much as possible by generating x87 code with /Qpc80

In a statement such as the following:

F=DSQRT(F1**2+F2**2+F3**2)

The code generated for use with the x87 co-processor (internally implimented in the CPU) would do the following pseudo code

fetch 64-bit dp F1, convert to 80-bit internal, push on stack
square 80-bit top of stack
fetch 64-bit dp F2, convert to 80-bit internal, push on stack
square 80-bit top of stack
add top two 80-bit stack entries
fetch 64-bit dp F3, convert to 80-bit internal, push on stack
square 80-bit top of stack
add top two 80-bit stack entries
80-bit square root of top of stack
convert 80-bit top of stack to 64-bit and store into 64-bit F

The newer code generation uses all 64-bit intermediaries using the SSE instruction set.
Note, F1**2 may produce lsb 2 bits of difference in the 64-bit temporary result (from those bit positions within the corrisponding 80-bit x87 register or approximately 11 lsbbits worth of uncertanty in the 80-bit register). The summation of these can result in 12 to 13 bits of uncertanty in the 80-bit number. The square root of this would result in 6 to 7 bits of uncertanty. So expect 6-7 bits of difference using all 64-bit calculations for dsqrt(f1**2+f2**2+f3**2) vs 80-bit. You observed more error due to the fact that f1, f2, f3 had additional uncertanties bult in from earlier calculations.

To continue using extended precision, create a function in a seperateMODULE which you compile with the /Qpc80 (and omit any SSE/SSE2/AVX/host options)

INTERFACE VECMAG

FUNCTION VECMAG3(A,B,C) RESULT(MAG)
IMPLICIT NONE
REAL(8) :: MAG, A, B, C
END FUNCTION VECMAG3

FUNCTION VECMAG1(A) RESULT(MAG)
IMPLICIT NONE
REAL(8) :: MAG, A(3)
END FUNCTION VECMAG1

END INTERFACE

CONTAINS

FUNCTION VECMAG3(A,B,C) RESULT(MAG)
IMPLICIT NONE
REAL(8) :: MAG, A, B, C
MAG = DSQRT(A**2 + B**2 + C**2)
END FUNCTION VECMAG3

FUNCTION VECMAG1(A) RESULT(MAG)
IMPLICIT NONE
REAL(8) :: MAG, A(3)
MAG = DSQRT(A(1)**2 + A(2)**2 + A(3)**2)
END FUNCTION VECMAG1

Note, you may need to assure that the source to this module is not available for Inter-Procedural Optimizations (IPO). As the inlineing and optimizations may omit the compiler option switch use to produce the .MOD and its corisponding .OBJ files (i.e. use those of the project calling VECMAG).

Jim Dempsey
atru
Principiante
974 Vistas
2bmchenry

Thank you for your answer, bmchenry! I should say that I was slightly emotional, because I did not expect to see the differences in calculations. But now I understand the main reason is a word size. If so, it is not sound good. A number of similar calculations with such differences will be able to cause the differences up to 10 kilometers on the surface. You are right. Its no wrong; its just different. But how explain it to a boss? Thats the question!
atru
Principiante
974 Vistas

2 TimP (Intel)

Thank you! I am trying to migrate a big number of Fortran codes from old PC tonew PC. And I would like to see that the new codes (programs) demonstrate the same results and coincide with the test examples of old programs. And then move it to a new generation of Fortran.

bmchenry
Nuevo Colaborador II
974 Vistas

It sounds like you are doing distance/coordinate calculations based on GPS type coordinate system or something like that?
(given the >>large numbers involved i wouldn't expect it's a local survey!!)

There should be a plethora of information on triangulation/use of multi point references/redundancyto get the best approximation of the needed coordinates/dimensions

I would also suggest you set up some "known points" for testing.

Look at your equations with consideration of the other items mentioned in this thread (compiler options, etc) to produce the most accurate results within the constraints of the coordinates/reference points/numbers/values you have available to you.

TimP
Colaborador Distinguido III
974 Vistas
If your're making an effort to vaildate your programs on more up to date software, it might be worth considering currently supported versions.
You may need to address whether you can accept the results for SSE2 and more recent architectures, since SSE2 was introduced over a decade ago.
atru
Principiante
974 Vistas

2bmchenry

Probably, your name is Mr. Clairvoyant!
For some reasons I can not clarify all my tasks. Its a long, long song. I have the test examples. And I should to see - mere coincidence. Frankly speaking I have no any idea about it. I will follow all advices. My teachers said me: - a big difference is not an error, a small difference the Evil is there. (Or something like that).

jimdempseyatthecove
Colaborador Distinguido III
974 Vistas
>>with such differences will be able to cause the differences up to 10 kilometers on the surface

Then you need to change your calculations such that they produce more accuracy in the calculations.

For example your distance to surface. Assuming your current distance (coordinate system) is in units of meters.

F1KM = DINT(F1 / 1000.0_8)
F1MREM = F1 - (F1KM * 1000.0_8)
F2KM = DINT(F2 / 1000.0_8)
F2MREM = F2 - (F2KM * 1000.0_8)
F3KM = DINT(F3 / 1000.0_8)
F3MREM = F3 - (F3KM * 1000.0_8)

F = DSQRT( &
& (F1KM**2 + F2KM**2 + F3KM**2) &
& + (2.0_8 * (F1KM*F1MREM + F2KM*F2MREM + F3KM*F3MREM)) &
& + (F1MREM**2 + F2MREM**2 + F1MREM**2 ))

And make sure that you have the option switch enabled that says protect the parens
The above will give you a few more bits of accuracy using all 64-bit intermediaries.

Jim Dempsey
atru
Principiante
974 Vistas

2 Jim Dempsey

Jim! Your advices are brilliant! But they need some my actions and thinking. So, in current episode I have 32-bit (I think. Am I wrong?). See the Table, column 2. 64-bit will be in the next episode.

TimP
Colaborador Distinguido III
974 Vistas
"real*8" is the same in 32- or 64-bit mode. The idea of a difference is a fairly common misconception. In fact, compilers for Windows X64 don't support x87 code, so you could expect plain "real*8" double precision.
atru
Principiante
974 Vistas

2 TimP (Intel)

You are right. But in fact I have not a 64-bit mode. Please, see the Table. I promise to use all advises concerning 64-bit mode in the future. It will be next long song. Just imagine one needs to transfer the DOS program to Windows XP 32-bit.

jimdempseyatthecove
Colaborador Distinguido III
974 Vistas
As pointed out, real*8 is the same binary format in DOS, in Windows 3.1, Windows XP x32, Windows XP x64, Windows 7 x32 or x64.

The difference you were observing was the x87 mode code emulates the FPU 8087 co-processor. This mode handles host formats of 32-bit IEEE 754 single precision floating point (real*4), 64-bit IEEE 754 double precision floating point (real*8), and an 80-bitprecision floating point (no IEEE 754 representation). All loads were promoted from 32/64/80 bits on host to 80 internally in theFPU. The compiler would generate code such that the intermediary results were kept inside the FPU (or FPU emulator) in 80-bit format. Keeping higher precision for chained calculations dsqrt(f1**2 + f2**2 +f3**2) would keep the intermediary results in 80-bit format (higher precision). Conversion back to host storage (32/64 bits would round the result).

When you look at the code for precision improvements, try to manipulate small numbers with small numbers and large with large. Sometimes this takes extra code (VECMAG example). Other times this may require some change in code.

I have experience in programming orbiting tether structures. In this model the Sun represents the center of the world, but I cannot use distances measured with the sun as origin. What is done instead is I place a moving sprite in proximity of the body about which the tether orbits (earth, mone, mars, ...) then use the sprite location and velocity as base values to compute the obital dynamics. What this does is permit most of the calculations able to perform using smaller numbers (on the order of 0-100,000km as opposed to on the order400,000,000 km (gaining ~4 digits additional precision).

Jim Dempsey
TimP
Colaborador Distinguido III
974 Vistas
Finite element applications often work with differences from a common point in each element. As mentioned earlier, the option /assume:protect_parens is likely to be required so as to persuade the compiler not to violate parentheses. As we are looking at elements on the order of 10^-4 of the size of the entire object (e.g. automobile), this also accounts for a useful gain in precision such as Jim quoted.
SergeyKostrov
Colaborador Valioso II
974 Vistas

>>...The different results: where are mistakes?..

There are NO any mistakes. Results are different because different rounding modes are used by compilers.

Take a look at last values from theresult sets:

[cpp]Microsoft Fortran v.5.0 Intel Fortran Compiler 10.1 IA-32 ... ... 0.7687533858746432D-03 0.7687533858750443D-03 ... ... [/cpp]


Absolute Error is | 0.0000000000004011 | and results are differentstarting from13th digit after the decimal.

Microsoft Fortran v.5.0 was released around 1989 and this is a legacy compiler. Intel Fortran Compiler 10.1 is
a modern compiler.

The same rounding problem will happenif youtry to compare results between Turbo C++compiler v3.0released in 1992 and
a modern Intel C++ compiler v12.0. Do I need to proof it?

Best regards,
Sergey

atru
Principiante
974 Vistas

2 Sergey Kostrov

Spacibo, Sergey! There is no any reason to proof it. All explanations given above are very good. Besides, gfortran under Debian (P4 and above) gives the results like in the column 2. All of you push me to 64-bit. Thank you once more!

Responder