Compiler dependent results

ngalamba · ‎06-20-2005

Hi

I am working with a code in Fortran 77, double precision. I cannot reproduce the results of that code in different compilers. Can anyone give me some hint on the main possible reazons for this to happen? The code is relatively long and therefore difficult to trace the exact point where result start diverging from compiler to compiler.

Thanks

NG

TimP · ‎06-20-2005

So many possibilities, and you don't allow us to guess which compilers or architectures.
Most of the possibilities involve source which isn't really Fortran 77, but happens to run with most f77 compilers.
Some situations, where compilers have different default treatments, and options available:

uninitialized variables
variables assumed static but not so declared
dynamic variables re-initialized by DATA (not available in f90)
single precision constants in double precision expressions
extended x87 precision used by some compilers, including past ifc
optimizations involving reciprocals
optimizations involving re-association within expressions
non-standard ENTRY

ngalamba · ‎06-20-2005

Thanks for your fast reply

To limit the possibilities I have now switched entirely to windows (xp) and started doing some runs on an Intel processor with two copilers: compaq 6.6 and ifl (7.0).
I had the same problem in linux with PGI and Intel compilers.

The results are reproductible in each compiler individually and they just start seriously diverging from one another after a number of iterations (say 1000).

I follow up some of the variables to try to see if there is any abrupt change, but this is very smooth.

For example at start I have for step 0:

compaq: 0 3.180673278510795E-002
intel: 0 3.18067327851079D-002

After 500 iterations for the same variable:

compaq:500 -0.364960403526608
Intel: 500 -0.364960208039116

This difference then keeps increasing.

The code uses intrinsic functions (all Dble Prec) but I don't really think that is the cause although it probably helps propagation.

Thanks a lot for your help.

NG

ngalamba · ‎06-23-2005

Thanks for your fast reply. I ended replying to my own message and din't notice. I give here some more details about the calculations as you sugested.

To limit the possibilities I have now switched entirely to windows (xp) and started doing some runs on an Intel processor (Pentium IV, 1.5GHz) with three copilers: compaq 6.6, ifl (7.0)and Lahey-Fujitsu 95.
I had the same problem in linux with PGI and Intel 8.0 (ifort) compilers.

The results are reproductible in each compiler individually (but different in all three) and they just start seriously diverging from one another after a number of iterations (say 1000).

I follow up some of the variables to try to see if there is any abrupt change, but this turns out to be very smooth.

For example at start I have for step 0:

compaq: 0 3.180673278510795E-002
intel: 0 3.18067327851079D-002

After 500 iterations for the same variable:

compaq:500 -0.364960403526608
Intel: 500 -0.364960208039116

This difference then keeps increasing.

No compiler gives any warnings or errors. The intel compiler warned about tab characters and intrinsic functions (e.g. DFLOAT) that are extensions to f90. Substitution of these do not change the results however.

The code exists in a single file (no INCLUDE). It uses COMMON BLOCKS and EQUIVALENCE statements to pass variables through different SUBROUTINES.

I have been cutting the program into smaller and smaller pieces to reduce it as much as possible to its chorus and see if I spot the error, but this is tedious and the results don't seem to change. I even passed some of the SUBROUTINES into the MAIN program unit.

Perhaps you can suggest me any other way of searching for the error. I also tryed different compiling options without much success in understanding what can be wrong.

Thanks a lot for your help.

NG

Steven_L_Intel1 · ‎06-23-2005

Why do you consider this an "error"? When you are dealing with fixed-precision floating point, rounding can change depending on order of operations or compiler choice as to whether to keep values in extended precision registers or not. The differences you are seeing are well within what I would consider normal for so many iterations. Is your input data so accurate that these seventh-digit differences matter?

I don't think there's anything wrong with the code or the compiler - it's just a normal state of affairs with computational arithmetic.

ngalamba · ‎06-23-2005

Untill recently I was inclined to fully agree with your explanation. However I noticed that other codes doing similar type of calculations keep the results equal for different compilers within 8 figures. The thing is that this code is not to be run for 500 iterations but millions instead, after which the results are then completely different. I don't feel very confortable with it, since then the question arises of which results should I use.

e.g. for 50 000 iterations the final result of two variables are:

compaq final: 0.542708150408233E+02 0.418531824196778E+03

Intel final: 0.106917608168976E+04 0.156911394876724E+04

After a certain number of iterations the basic variables change too much and the two runs take completely different ways. There is some "error" propagation that takes the final results to be completely different. In other codes of the same type this does not happen.

Given the results above I believe that perhaps something is wrong with the code.

Thanks a lot in advance.

NG

Steven_L_Intel1 · ‎06-23-2005

Please use the "Reply" button to add a reply rather than posting a new thread. I have been moving your posts back here.

It could be that your code uses a single-precision variable at some point and this is causing unnecessary rounding. But another possibility is that the Intel compiler is keeping results in extended precision longer and that the results you are getting are actually better. What sort of validation do you do on the results?

If you are iterating 50,000 times, any last-bit rounding difference is going to be amplified a lot. Depending on your algorithm, you may be losing all significance in the result. You would have to do some numerical analysis of the algorithm to understand the error behavior.

Other codes may behave differently.