Different numerical results on AMD vs INTEL processors for the same binaries

rakkota · ‎11-17-2008

Hi,

I am facing a strange problem. I generated a set of binaries from a mixed FORTRAN-C project. For FORTRAN I am using the INTEL 10 compiler and for C the MS compiler. When I am running these binaries on two different processors (Pentium 4 vs. Athlon 64 3500) I am getting different numerical results.

I tried various compiling flags affecting the floating point operations but I didn't find the magic combination in order the get exactly the same results. I have to mention that I don't have any parallel code and the OS is the same (WinXP 32). Bellow are the flags that I am currently using:

/intconstant /fpconstant /f77rtl /fp:source /real_size:32 /double_size:64 /Qprec-div /assume:buffered_io /O3 /check:none

Did somebody encountered such a problem? Any suggestion is highly appreciated.

Thanks,

Rak

Steven_L_Intel1 · ‎11-17-2008

The Intel math library always uses CPU dispatching internally and this can cause such differences if SSE is used on one processor and X87 on another.

rakkota · ‎11-18-2008

Quoting - Steve Lionel (Intel)

The Intel math library always uses CPU dispatching internally and this can cause such differences if SSE is used on one processor and X87 on another.

Thanks for replying. Is there a way to not rely on the processors defaults but to set the floating point computationenvironment (either by compiler flags or by FORTRAN instructions)? I tried GETCONSTROLFPQQ and SETCONTROLFPQQ but it does not work on AMD :-(

Thanks,

Rak

jimdempseyatthecove · ‎11-18-2008

Quoting - rakkota

Thanks for replying. Is there a way to not rely on the processors defaults but to set the floating point computationenvironment (either by compiler flags or by FORTRAN instructions)? I tried GETCONSTROLFPQQ and SETCONTROLFPQQ but it does not work on AMD :-(

Thanks,

Rak

Rak,

What I do is to compile the PROGRAM source with no processor specific declarations. Then compile all the other source files with the optimization levels .AND. using requires ... (e.g. uses SSE3 and requires SSE3).

You will have a problem in running the application on a non-supported system in that the error message will not be so friendly.

Jim Dempsey

rakkota · ‎11-18-2008

Quoting - jimdempseyatthecove

Rak,

What I do is to compile the PROGRAM source with no processor specific declarations. Then compile all the other source files with the optimization levels .AND. using requires ... (e.g. uses SSE3 and requires SSE3).

You will have a problem in running the application on a non-supported system in that the error message will not be so friendly.

Jim Dempsey

Hi Jim, thanks for relying, it sounds very interesting what are you saying. Could you be more specific about the compiler flags used to target a specific SSE technology and is portable to AMD (at a glance, only /QxO looks as a good candidate)? I'm not so much interested in optimization as I am in having copy-carbon numbers on both INTEL and AMD processors.

Thanks,

Rak

TimP · ‎11-18-2008

Quoting - rakkota

Hi Jim, thanks for relying, it sounds very interesting what are you saying. Could you be more specific about the compiler flags used to target a specific SSE technology and is portable to AMD (at a glance, only /QxO looks as a good candidate)? I'm not so much interested in optimization as I am in having copy-carbon numbers on both INTEL and AMD processors.

Thanks,

Rak

The 32-bit default no-SSE option, /QxW, /QxO all are intended to work the same on AMD and Intel. /QxO is not suitable for your Pentium4 (unless it is 64-bit capable); I don't know AMD numbers. /Qprec-div (which you quoted) and /Qprec-sqrt should be set so as to avoid instructions which give sligntly different resultsbetween AMD and Intel. /fp:source or precise shut off some (but not all) vectorization of math functions which may differ slightly between brands. I didn't mention this until now, because you hadn't mentioned consideration of vectorization.

The differences between AMD and Intel CPUs in 32-bit environment may be overshadowed by numerical differences associated with differing alignment of vectorized loops. /fp:source or precise also remove those vectorizations which are likely to be affected. 64-bit OS is a more effective cure for this.

I've learned there is no supported option which completely eliminates numerical differences between Intel and AMD CPUs. This is an interest of my customers as well.

Steven_L_Intel1 · ‎11-18-2008

In version 11 you could use /QxHost to get the best option for the system you're compiling on. However, none of the compiler options affect the internal CPU dispatching used by the math library.

rakkota · ‎11-20-2008

Quoting - tim18

The 32-bit default no-SSE option, /QxW, /QxO all are intended to work the same on AMD and Intel. /QxO is not suitable for your Pentium4 (unless it is 64-bit capable); I don't know AMD numbers. /Qprec-div (which you quoted) and /Qprec-sqrt should be set so as to avoid instructions which give sligntly different resultsbetween AMD and Intel. /fp:source or precise shut off some (but not all) vectorization of math functions which may differ slightly between brands. I didn't mention this until now, because you hadn't mentioned consideration of vectorization.

The differences between AMD and Intel CPUs in 32-bit environment may be overshadowed by numerical differences associated with differing alignment of vectorized loops. /fp:source or precise also remove those vectorizations which are likely to be affected. 64-bit OS is a more effective cure for this.

I've learned there is no supported option which completely eliminates numerical differences between Intel and AMD CPUs. This is an interest of my customers as well.

Thanks Jim for the tip. After many tests, the ifort compiler flags which made the numerical results to be the same were:

/intconstant /fpconstant /f77rtl /QxW /real_size:32 /double_size:64 /Qprec-div/Qprec-sqrt /assume:buffered_io /O3 /check:none

We've got similar results by replacing /QxW with /architecture:SSE2.

I am afraid though that this is working only for the particular tests that we are currently doing and this recipe cannot be universally applied.

I think that Lionel's message is pretty clear: there is no way to guarantee bitwise same results on AMD and INTEL architectures for numerical computation in FORTRAN using compiler flags only. Maybe I am naive but it is a shame that AMD and INTEL didn't agree on the same standard when it comes to numerical computation. I imagine the executives of these two companies in a plane where the pilot is using an INTEL processor and the tower an AMD processor...

TimP · ‎11-20-2008

Quoting - rakkota

/intconstant /fpconstant /f77rtl /QxW /real_size:32 /double_size:64 /Qprec-div/Qprec-sqrt /assume:buffered_io /O3 /check:none

We've got similar results by replacing /QxW with /architecture:SSE2.

The new SSE2 option isa synonym for the old /QxW. If the results weren't the same, it would be a bug.

/Qprec-div /Qprec-sqrt, along withoptions which support onlyone SSE architecture,will beneeded for consistent results among platforms,in any application which vectorizes single precision divideor sqrt. My personal recommendation is to use the Qprec options always, at least for current production CPUs, which perform well with them. They will be set automatically if you set /fp:source (or precise).

I am thinking it would be good policy to set the options you want always in ifort.cfg, as a precaution in case they don't make it into the build elsewhere.