- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am having a problem with ifort, using Version 10.1. Compiling and running on the 32 bit platform (32 bit exe) generates
code that agrees to 30+ digits either -O0 or -O3. Compiling and running on the 64 bit platform (64 bit exe) generates results that agree only to 20 digits comparing -O0 and -O3, and also the strange result that neither optimization agrees to > 20 digits with the 32 bit code. I have narrowed it down to the diagonalization step (matrix element computation agrees to all but the last digit no matter how I compile and run. I have also noticed that if I use the -ftrapuv option with this code, the code hangs. Suggestions?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The 32bit Intel architecture can use x87 registers, whereas the Intel 64 architecture will use SSE.
You should try the '-fp-model precise' compiler option. This should be used in cases where you want more precision. I would also recommend reading about -fp-model and it's various arguments. Also, -ftz and -no-ftz can be researched especially if you are detecting underflow. -no-ftz may help, but there is a performance penalty for -no-ftz on the Intel 64 architecture.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Version 10.1 defaults to using the old X87 instructions for floating point on IA-32. These are slow, but also tend to do computations in higher precision than what was declared. The Intel 64 platform always uses SSE instructions for FP and these use declared precision. Version 11.0 defaults to using SSE2 on IA-32, the same as the Intel 64 compiler.
Try building your 32-bit program with -xW and see if the results agree with the 64-bit version (or are closer). In general, it is not realistic to expect -O0 and -O3 results to agree to so many digits (especially given that double precision is good to about 15 digits) due to the characteristics of computational floating point. You may also find that "-fp-model source" gives you more consistency, at the cost of some performance.
Edit: I just saw Ron's response which came in while I was typing. Note that for Fortran, -fp-model precise means the same as -fp-model source and you'll get a warning from the driver saying so if you use "precise".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The 32bit Intel architecture can use x87 registers, whereas the Intel 64 architecture will use SSE.
You should try the '-fp-model precise' compiler option. This should be used in cases where you want more precision. I would also recommend reading about -fp-model and it's various arguments. Also, -ftz and -no-ftz can be researched especially if you are detecting underflow. -no-ftz may help, but there is a performance penalty for -no-ftz on the Intel 64 architecture.
Oh, yes, I forgot to mention the compiler options I've been using. they are
-mp -unroll -pc64
I think -mp is pretty much the same as -fp-model precise, but I just changed the options to be just
-fp-model precise
The 32 bit results do not change at all. the 64 bit results changed in the 20th digit, but still disagree in the 20th digit with the 32 bit results.
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, -mp is not the same and should not be used. Please add -xW on the 32-bit side.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Version 10.1 defaults to using the old X87 instructions for floating point on IA-32. These are slow, but also tend to do computations in higher precision than what was declared. The Intel 64 platform always uses SSE instructions for FP and these use declared precision. Version 11.0 defaults to using SSE2 on IA-32, the same as the Intel 64 compiler.
Try building your 32-bit program with -xW and see if the results agree with the 64-bit version (or are closer). In general, it is not realistic to expect -O0 and -O3 results to agree to so many digits (especially given that double precision is good to about 15 digits) due to the characteristics of computational floating point. You may also find that "-fp-model source" gives you more consistency, at the cost of some performance.
Edit: I just saw Ron's response which came in while I was typing. Note that for Fortran, -fp-model precise means the same as -fp-model source and you'll get a warning from the driver saying so if you use "precise".
I understand and expect -O0 and -O3 results to disagree. What I don't understand is why the -O0 results disagree,or which one is correct :=)
I am now using for compiler flags
-O0 -fp-model source -xW. Results are still different at the 20th digit.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What I don't understand is why the -O0 results disagree,or which one is correct :=)
-O0 -fp-model source -xW. Results are still different at the 20th digit.
Sorry to disturb here (because I think I am unable to anwser your question).
But if you are using double precision, any digit after the 15th is "crap". I would not use them to report my results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry to disturb here (because I think I am unable to anwser your question).
But if you are using double precision, any digit after the 15th is "crap". I would not use them to report my results.
I am using quad precision.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry to disturb here (because I think I am unable to anwser your question).
But if you are using double precision, any digit after the 15th is "crap". I would not use them to report my results.
I am using quad precision.
Steve or someone else from Intel would be best to highlight the differences in REAL(16) between 32-bit platform using x87 and 64-bit platform using SSEn.n.
From my limited understanding of this issue on the 32-bit platform, using x87 instructions, the computations are carried out using TBYTE, REAL(10) computations as supported by the x87 instruction set. On x64, a software library is used to emulate the larger data representation. As to the effective number of bits or digits of precision supported by the simulation library (as opposed to the IEEE X_floating standard) that would have to be discussed with Steve or someone else at Intel.
From the description of your symptoms, I would suggest you first check your code carefully to see if you are mixing REAL(8), REAL(4), or INTEGER of any width as the conversion (promotion or demotion) rules may differ depending on platform. Force promotion from lessor precise to REAL(16) before performing calculations.Also check trig and other intrinsicfunction calls that are being passed the REAL(16) as thecode (and compiler) may be calling the REAL(8) version of the function with the appropriate conversions taking place. If there are no unintended conversion operations going on, then you may have discovered a problem in the emulation library or in the compiler's use of the emulation library. Check your code first.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I agree with Jim's suggestion that you may have real(8) somewhere in your code.
On either 32- or 64-bit platform, real(16) ought to be implemented the same, using combinations of x87 instructions. It's possible, in the 32-bit compilation, that some intermediate real(8) might be promoted in effect to real(10), while this would not occur in the 64-bit case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Our implementation of REAL(16) is the same on all platforms we support. It is all done in software using integer arithmetic, so has no relationship to floating point hardware support.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Our implementation of REAL(16) is the same on all platforms we support. It is all done in software using integer arithmetic, so has no relationship to floating point hardware support.
Then this would indicate that the problem is not (likely) with the emulator but instead likely to involve a difference in promotion/demotion of mixed precision expressions as assumed in my prior post. However, I do preface the emulator with "not (likely)" due to the software using interger arithmetic and the integer size potentially being different between x32 and x63 (assuming .ASM or .CPP code being used in the emulator).
jamesssims, having the code, could resolve this.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page