- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am facing a strange problem. I generated a set of binaries from a mixed FORTRAN-C project. For FORTRAN I am using the INTEL 10 compiler and for C the MS compiler. When I am running these binaries on two different processors (Pentium 4 vs. Athlon 64 3500) I am getting different numerical results.
I tried various compiling flags affecting the floating point operations but I didn't find the magic combination in order the get exactly the same results. I have to mention that I don't have any parallel code and the OS is the same (WinXP 32). Bellow are the flags that I am currently using:
/intconstant /fpconstant /f77rtl /fp:source /real_size:32 /double_size:64 /Qprec-div /assume:buffered_io /O3 /check:none
Did somebody encountered such a problem? Any suggestion is highly appreciated.
Thanks,
Rak
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Intel math library always uses CPU dispatching internally and this can cause such differences if SSE is used on one processor and X87 on another.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Intel math library always uses CPU dispatching internally and this can cause such differences if SSE is used on one processor and X87 on another.
Thanks for replying. Is there a way to not rely on the processors defaults but to set the floating point computationenvironment (either by compiler flags or by FORTRAN instructions)? I tried GETCONSTROLFPQQ and SETCONTROLFPQQ but it does not work on AMD :-(
Thanks,
Rak
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for replying. Is there a way to not rely on the processors defaults but to set the floating point computationenvironment (either by compiler flags or by FORTRAN instructions)? I tried GETCONSTROLFPQQ and SETCONTROLFPQQ but it does not work on AMD :-(
Thanks,
Rak
Rak,
What I do is to compile the PROGRAM source with no processor specific declarations. Then compile all the other source files with the optimization levels .AND. using requires ... (e.g. uses SSE3 and requires SSE3).
You will have a problem in running the application on a non-supported system in that the error message will not be so friendly.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Rak,
What I do is to compile the PROGRAM source with no processor specific declarations. Then compile all the other source files with the optimization levels .AND. using requires ... (e.g. uses SSE3 and requires SSE3).
You will have a problem in running the application on a non-supported system in that the error message will not be so friendly.
Jim Dempsey
Hi Jim, thanks for relying, it sounds very interesting what are you saying. Could you be more specific about the compiler flags used to target a specific SSE technology and is portable to AMD (at a glance, only /QxO looks as a good candidate)? I'm not so much interested in optimization as I am in having copy-carbon numbers on both INTEL and AMD processors.
Thanks,
Rak
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim, thanks for relying, it sounds very interesting what are you saying. Could you be more specific about the compiler flags used to target a specific SSE technology and is portable to AMD (at a glance, only /QxO looks as a good candidate)? I'm not so much interested in optimization as I am in having copy-carbon numbers on both INTEL and AMD processors.
Thanks,
Rak
The 32-bit default no-SSE option, /QxW, /QxO all are intended to work the same on AMD and Intel. /QxO is not suitable for your Pentium4 (unless it is 64-bit capable); I don't know AMD numbers. /Qprec-div (which you quoted) and /Qprec-sqrt should be set so as to avoid instructions which give sligntly different resultsbetween AMD and Intel. /fp:source or precise shut off some (but not all) vectorization of math functions which may differ slightly between brands. I didn't mention this until now, because you hadn't mentioned consideration of vectorization.
The differences between AMD and Intel CPUs in 32-bit environment may be overshadowed by numerical differences associated with differing alignment of vectorized loops. /fp:source or precise also remove those vectorizations which are likely to be affected. 64-bit OS is a more effective cure for this.
I've learned there is no supported option which completely eliminates numerical differences between Intel and AMD CPUs. This is an interest of my customers as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In version 11 you could use /QxHost to get the best option for the system you're compiling on. However, none of the compiler options affect the internal CPU dispatching used by the math library.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The 32-bit default no-SSE option, /QxW, /QxO all are intended to work the same on AMD and Intel. /QxO is not suitable for your Pentium4 (unless it is 64-bit capable); I don't know AMD numbers. /Qprec-div (which you quoted) and /Qprec-sqrt should be set so as to avoid instructions which give sligntly different resultsbetween AMD and Intel. /fp:source or precise shut off some (but not all) vectorization of math functions which may differ slightly between brands. I didn't mention this until now, because you hadn't mentioned consideration of vectorization.
The differences between AMD and Intel CPUs in 32-bit environment may be overshadowed by numerical differences associated with differing alignment of vectorized loops. /fp:source or precise also remove those vectorizations which are likely to be affected. 64-bit OS is a more effective cure for this.
I've learned there is no supported option which completely eliminates numerical differences between Intel and AMD CPUs. This is an interest of my customers as well.
Thanks Jim for the tip. After many tests, the ifort compiler flags which made the numerical results to be the same were:
/intconstant /fpconstant /f77rtl /QxW /real_size:32 /double_size:64 /Qprec-div/Qprec-sqrt /assume:buffered_io /O3 /check:none
We've got similar results by replacing /QxW with /architecture:SSE2.
I am afraid though that this is working only for the particular tests that we are currently doing and this recipe cannot be universally applied.
I think that Lionel's message is pretty clear: there is no way to guarantee bitwise same results on AMD and INTEL architectures for numerical computation in FORTRAN using compiler flags only. Maybe I am naive but it is a shame that AMD and INTEL didn't agree on the same standard when it comes to numerical computation. I imagine the executives of these two companies in a plane where the pilot is using an INTEL processor and the tower an AMD processor...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
/intconstant /fpconstant /f77rtl /QxW /real_size:32 /double_size:64 /Qprec-div/Qprec-sqrt /assume:buffered_io /O3 /check:none
We've got similar results by replacing /QxW with /architecture:SSE2.
The new SSE2 option isa synonym for the old /QxW. If the results weren't the same, it would be a bug.
/Qprec-div /Qprec-sqrt, along withoptions which support onlyone SSE architecture,will beneeded for consistent results among platforms,in any application which vectorizes single precision divideor sqrt. My personal recommendation is to use the Qprec options always, at least for current production CPUs, which perform well with them. They will be set automatically if you set /fp:source (or precise).
I am thinking it would be good policy to set the options you want always in ifort.cfg, as a precaution in case they don't make it into the build elsewhere.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page