Re: different results on the 32 bit and 64 bit platforms

Anonymous29 · ‎12-24-2008

I am having a problem with ifort, using Version 10.1. Compiling and running on the 32 bit platform (32 bit exe) generates
code that agrees to 30+ digits either -O0 or -O3. Compiling and running on the 64 bit platform (64 bit exe) generates results that agree only to 20 digits comparing -O0 and -O3, and also the strange result that neither optimization agrees to > 20 digits with the 32 bit code. I have narrowed it down to the diagonalization step (matrix element computation agrees to all but the last digit no matter how I compile and run. I have also noticed that if I use the -ftrapuv option with this code, the code hangs. Suggestions?

Ron_Green · ‎12-24-2008

The 32bit Intel architecture can use x87 registers, whereas the Intel 64 architecture will use SSE.

You should try the '-fp-model precise' compiler option. This should be used in cases where you want more precision. I would also recommend reading about -fp-model and it's various arguments. Also, -ftz and -no-ftz can be researched especially if you are detecting underflow. -no-ftz may help, but there is a performance penalty for -no-ftz on the Intel 64 architecture.

Steven_L_Intel1 · ‎12-24-2008

Version 10.1 defaults to using the old X87 instructions for floating point on IA-32. These are slow, but also tend to do computations in higher precision than what was declared. The Intel 64 platform always uses SSE instructions for FP and these use declared precision. Version 11.0 defaults to using SSE2 on IA-32, the same as the Intel 64 compiler.

Try building your 32-bit program with -xW and see if the results agree with the 64-bit version (or are closer). In general, it is not realistic to expect -O0 and -O3 results to agree to so many digits (especially given that double precision is good to about 15 digits) due to the characteristics of computational floating point. You may also find that "-fp-model source" gives you more consistency, at the cost of some performance.

Edit: I just saw Ron's response which came in while I was typing. Note that for Fortran, -fp-model precise means the same as -fp-model source and you'll get a warning from the driver saying so if you use "precise".

Anonymous29 · ‎12-24-2008

Quoting - Ronald Green (Intel)

The 32bit Intel architecture can use x87 registers, whereas the Intel 64 architecture will use SSE.

You should try the '-fp-model precise' compiler option. This should be used in cases where you want more precision. I would also recommend reading about -fp-model and it's various arguments. Also, -ftz and -no-ftz can be researched especially if you are detecting underflow. -no-ftz may help, but there is a performance penalty for -no-ftz on the Intel 64 architecture.

Oh, yes, I forgot to mention the compiler options I've been using. they are

-mp -unroll -pc64

I think -mp is pretty much the same as -fp-model precise, but I just changed the options to be just

-fp-model precise

The 32 bit results do not change at all. the 64 bit results changed in the 20th digit, but still disagree in the 20th digit with the 32 bit results.

Jim

Steven_L_Intel1 · ‎12-24-2008

No, -mp is not the same and should not be used. Please add -xW on the 32-bit side.

Anonymous29 · ‎12-24-2008

Quoting - Steve Lionel (Intel)

Version 10.1 defaults to using the old X87 instructions for floating point on IA-32. These are slow, but also tend to do computations in higher precision than what was declared. The Intel 64 platform always uses SSE instructions for FP and these use declared precision. Version 11.0 defaults to using SSE2 on IA-32, the same as the Intel 64 compiler.

Try building your 32-bit program with -xW and see if the results agree with the 64-bit version (or are closer). In general, it is not realistic to expect -O0 and -O3 results to agree to so many digits (especially given that double precision is good to about 15 digits) due to the characteristics of computational floating point. You may also find that "-fp-model source" gives you more consistency, at the cost of some performance.

Edit: I just saw Ron's response which came in while I was typing. Note that for Fortran, -fp-model precise means the same as -fp-model source and you'll get a warning from the driver saying so if you use "precise".

I understand and expect -O0 and -O3 results to disagree. What I don't understand is why the -O0 results disagree,or which one is correct :=)

I am now using for compiler flags

-O0 -fp-model source -xW. Results are still different at the 20th digit.

Rafael_Barreto · ‎12-24-2008

Quoting - jamesssims@yahoo.com

What I don't understand is why the -O0 results disagree,or which one is correct :=)

-O0 -fp-model source -xW. Results are still different at the 20th digit.

Sorry to disturb here (because I think I am unable to anwser your question).

But if you are using double precision, any digit after the 15th is "crap". I would not use them to report my results.

Anonymous29 · ‎12-24-2008

Quoting - Rafael Barreto

Sorry to disturb here (because I think I am unable to anwser your question).

But if you are using double precision, any digit after the 15th is "crap". I would not use them to report my results.

I am using quad precision.

jimdempseyatthecove · ‎12-29-2008

Quoting - jamesssims@yahoo.com

Quoting - Rafael Barreto

Sorry to disturb here (because I think I am unable to anwser your question).

But if you are using double precision, any digit after the 15th is "crap". I would not use them to report my results.

I am using quad precision.

Steve or someone else from Intel would be best to highlight the differences in REAL(16) between 32-bit platform using x87 and 64-bit platform using SSEn.n.

From my limited understanding of this issue on the 32-bit platform, using x87 instructions, the computations are carried out using TBYTE, REAL(10) computations as supported by the x87 instruction set. On x64, a software library is used to emulate the larger data representation. As to the effective number of bits or digits of precision supported by the simulation library (as opposed to the IEEE X_floating standard) that would have to be discussed with Steve or someone else at Intel.

From the description of your symptoms, I would suggest you first check your code carefully to see if you are mixing REAL(8), REAL(4), or INTEGER of any width as the conversion (promotion or demotion) rules may differ depending on platform. Force promotion from lessor precise to REAL(16) before performing calculations.Also check trig and other intrinsicfunction calls that are being passed the REAL(16) as thecode (and compiler) may be calling the REAL(8) version of the function with the appropriate conversions taking place. If there are no unintended conversion operations going on, then you may have discovered a problem in the emulation library or in the compiler's use of the emulation library. Check your code first.

Jim Dempsey

TimP · ‎12-29-2008

I agree with Jim's suggestion that you may have real(8) somewhere in your code.

On either 32- or 64-bit platform, real(16) ought to be implemented the same, using combinations of x87 instructions. It's possible, in the 32-bit compilation, that some intermediate real(8) might be promoted in effect to real(10), while this would not occur in the 64-bit case.

Steven_L_Intel1 · ‎12-29-2008

Our implementation of REAL(16) is the same on all platforms we support. It is all done in software using integer arithmetic, so has no relationship to floating point hardware support.

jimdempseyatthecove · ‎12-29-2008

Quoting - Steve Lionel (Intel)

Our implementation of REAL(16) is the same on all platforms we support. It is all done in software using integer arithmetic, so has no relationship to floating point hardware support.

Then this would indicate that the problem is not (likely) with the emulator but instead likely to involve a difference in promotion/demotion of mixed precision expressions as assumed in my prior post. However, I do preface the emulator with "not (likely)" due to the software using interger arithmetic and the integer size potentially being different between x32 and x63 (assuming .ASM or .CPP code being used in the emulator).

jamesssims, having the code, could resolve this.

Jim Dempsey

john-l-ditter · ‎12-28-2012

I am having the same problem. jamesssims never let us know what happened after he used the new compiler options. Here is my situation: - ifort version 11.1.048 - All the operations are done in double precision in 32 and 64 bits. - Compiler options in 32: -c /w90 /w95 /cm -DP4 -DWIN32 /Qfpp /MD /4Yportlib /O3 /fp:source /Qip /Qopenmp - Compiler options in 64: -c /w90 /w95 /cm -DP4 -DWIN32 -DAMD64_WIN /Qfpp /MD /4Yportlib /O3 /fp:source /Qip /Qopenmp Differences in the 15th place after the decimal point creep in and eventually over time, the results become completely different. I have been printing numbers in a really long format to see the number as it exists in memory and for example I get these: 64 bit: 0.93049996167787141221339197727502323690000000000000E+00 32 bit: 0.93049996167787196732490428985329344870000000000000E+00 As you can see in the 15th digit, I'll end up with 1 in 64 bit and after rounding with 2 in 32 bit. This is despite the fact that I am using fp:source. Any help would be appreciated.

Anonymous29 · ‎12-28-2012

Sorry, I can't help. We haven't had any 32 bit machines in a long time.

Steven_L_Intel1 · ‎12-28-2012

John started a thread on this here.