different results using intel Linux with respect to IBM AIX xlf90

lpeter09 · ‎08-18-2009

Dear sirs,
I use intel fortran compiler release 10 on Linux AMD quad core platform.
We did some benchmark of fortran 95 code and we did compare with the same code running on IBM AIX (xlf compiler r12).
Surprisingly, the results are not the same....the difference is quite small almost everywhere but are quite high for one output variable.

I did some check and it seems that different results are due to different behaviour of this simple line:

temp = ((stat(i_stat)%c_std(i,ii) - n_data * ((stat(i_stat)%c_mean(i,ii))**2) ) / (n_data-1) )

all the variables are real (I promote them using -r8 on AMD and -qrealsize=8 on AIX)

Of course, I try almost all compiler flags on both architectures
-O0, -O3 -qstrict, -O0 -qstrict -qfloat=nomaf on AIX platform

-O0, -O3 -fpe0, -O3-fp-model precise -fpe0, -O2 -fp-model strict -no-vec on AMD platform

but the results are different (the max of the difference is of order 10^-05 that it seems to me very high

Do you have any suggestion?

thank's in advance

Piero

jimdempseyatthecove · ‎08-18-2009

Piero,

Try forcing your integer constants and expressions to DBLE

temp = ((stat(i_stat)%c_std(i,ii) - DBLE(n_data) * ((stat(i_stat)%c_mean(i,ii))**2) ) / DBLE(n_data-1) )

Jim Dempsey

TimP · ‎08-18-2009

Quoting - jimdempseyatthecove

Piero,

Try forcing your integer constants and expressions to DBLE

temp = ((stat(i_stat)%c_std(i,ii) - DBLE(n_data) * ((stat(i_stat)%c_mean(i,ii))**2) ) / DBLE(n_data-1) )

Jim Dempsey

This may be interesting to try, but it ought to make no difference with the quoted options.
Perhaps this also would make no difference, but you should note that the older Intel compilers for 32-bit mode defaulted to x87 code, so it seems you should have specified -xW or -xO if using the 32-bit compiler.
Differences such as you mention do make it appear that you have either serious accuracy cancellation (not totally surprising for a calculation like this) or somewhere you have not promoted to double precision. Remember that when you specify a KIND or the non-standard *4 notation, that prevents the automatic promotion to double.

lpeter09 · ‎08-18-2009

Quoting - tim18

Quoting - jimdempseyatthecove

Piero,

Try forcing your integer constants and expressions to DBLE

temp = ((stat(i_stat)%c_std(i,ii) - DBLE(n_data) * ((stat(i_stat)%c_mean(i,ii))**2) ) / DBLE(n_data-1) )

Jim Dempsey

This may be interesting to try, but it ought to make no difference with the quoted options.
Perhaps this also would make no difference, but you should note that the older Intel compilers for 32-bit mode defaulted to x87 code, so it seems you should have specified -xW or -xO if using the 32-bit compiler.
Differences such as you mention do make it appear that you have either serious accuracy cancellation (not totally surprising for a calculation like this) or somewhere you have not promoted to double precision. Remember that when you specify a KIND or the non-standard *4 notation, that prevents the automatic promotion to double.

Thank's for all suggestions....but, at the moment, the problem is not fix
I forgot to say that the architecture is 64 bit

$ ifort -V
Intel Fortran Compiler for applications running on Intel 64, Version 10.1 Build 20081024 Package ID: l_fc_p_10.1.021
Copyright (C) 1985-2008 Intel Corporation. All rights reserved.
FOR NON-COMMERCIAL USE ONLY

in the end...the problem is not fix again:)

Regards

Piero

jimdempseyatthecove · ‎08-18-2009

Piero,

Then I suggest you place a break on the statement, open a Dissassembly window on break, step through the code. This may expose the problem. Knowing the problem will help you produce a work around..

Jim

lpeter09 · ‎08-18-2009

Quoting - jimdempseyatthecove

Piero,

Then I suggest you place a break on the statement, open a Dissassembly window on break, step through the code. This may expose the problem. Knowing the problem will help you produce a work around..

Jim

Jim, of course this is a correct statement.....but the code is a well known code used on different machines and platform...it should be correct to give some reference but I'have some problem to do this.

The bug is always possible but in my opinion the code is robust and numerically excellent!

Piero

jimdempseyatthecove · ‎08-18-2009

The problem may be a compiler problem or programmer error. I've seen many compiler error situations over the last 40 years however, programmer error (due to false assumptions) are several orders of magnitude more frequent.

Seeing the assembly code (and understanding what is happening) will remove all assumptions as to what is being done, why it is being done, and determine the direction in which to point your finger.

There are issues (my assumption) relating to if your original statement had a blend of integer, real(*4) and real(*8). In particular:

assumption of REAL(*4) op REAL(*8) always promotes REAL(*4) to REAL(*8) prior to op is not always true.

assumpion of REAL(*8) op nn.mm (literal REAL) always promotes the literal to REAL(*8) is not (always) true.

assumpion of REAL(*8) op nn (literal INTEGER) always promotes the literal to REAL(*8) is not (always) true.

Examining the code will show you what is going on relating to these assumptions.

Now, when you find that a promotion is not performed when you expect it to be performed (e.g. other compiler did so for years or decades), you may find that the specification has not defined the behavior relating to the promotion. In which case, to quote Dirty Harry "You feeling lucky?" IOW your code was working by chance all these years.

Jim

eliosh · ‎08-18-2009

Quoting - jimdempseyatthecove

Piero,

Then I suggest you place a break on the statement, open a Dissassembly window on break, step through the code. This may expose the problem. Knowing the problem will help you produce a work around..

Jim

I also think that one of the compilers uses x87 instructions which gives you some "extra precision for free".
The best way to check it is to look at the assembly code as jim suggests.

P.S.
Since you use the non-commercial version of Intel compiler. I would suggest to upgrade to the latest version and to run your benchmarks again. In case the results do not change you can try to use quadruple precision (128 bit). My short experiments (not real data, just compilation and running time :) ) suggest that Intel supports this precision. The experts from this forum can correct me if I am wrong.

lpeter09 · ‎08-20-2009

Quoting - eliosh

Quoting - jimdempseyatthecove

Piero,

Then I suggest you place a break on the statement, open a Dissassembly window on break, step through the code. This may expose the problem. Knowing the problem will help you produce a work around..

Jim

I also think that one of the compilers uses x87 instructions which gives you some "extra precision for free".
The best way to check it is to look at the assembly code as jim suggests.

P.S.
Since you use the non-commercial version of Intel compiler. I would suggest to upgrade to the latest version and to run your benchmarks again. In case the results do not change you can try to use quadruple precision (128 bit). My short experiments (not real data, just compilation and running time :) ) suggest that Intel supports this precision. The experts from this forum can correct me if I am wrong.

First of all, thank's to eveybody.
I did some debug (using your useful sugegstion) and I have partially fixed the problem!
the fact was that the code does a sqrt of temp variable. The option:
-prec-sqrt fix this kind of problem.

On the other hand, unfortunately, the error is still live....more precisely the difference arises when temp variable is of the order of 10^-10.....on AIX machine temp is negative (still of the same order)...on Linux machine is positive

So, on IBM machine the final result is 0...on Linux machine the final result is of order 10^-5....

Some suggestion?
is not possible to use quadruple precision ....the code use HDF library for i/o and there's no interface for r16 precision variables,

thank's again

piero