- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We were happy to buy the Intel Mac compiler, but although the model is running, the results are different because "plain -O3" (or any -O level) on the Intel Mac compiler seems to have "hard-wired" CPU/vectorizing optimizations. This gives widely different results from the Linux & Win Intel Fortran compilers; although it does make for a fast model run! But it seems as if some of the physics (particularly in the ocean and in the aerosol/sulphate feedback/cooling in the atmosphere) gets "optimized away."
Has anyone come up with a way so that "O3" on the Mac doesn't automatically turn on various CPU/vectorizing options; as O3 on Win & Linux doesn't force those options on.
Thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On Mac, there is an implied -xP switch which enables vectorization. There is no supported way to turn this off. You can use !DEC$ NOVECTOR before sensitive loops to disable vectorization.
In a future release, the 64-bit Linux and Windows compilers will assume -xW, since all of those processors support SSE2, so they too will see vectorization where there had not been any before.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vectorization is one issue, while you are looking at the code make sure the default REAL size is what you assume it is. REAL(4) versis REAL(8) makes a significant difference in both precision and performance. The default on the Intel compiler is REAL(4). The use of SSE2/3 instructions can loose some precision in intermediary calculations too. I prefer to use SSE3 with REAL(8).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve,
The xP switch implies that floating-point operations can be implemented with SSE3 instructions for both vector and scalar code. Even though vectorization may change precision (most notorious due to reassociation), I suspect that the differences caused by using generic x87 instructions on Windows/Linux (with 80-bit intermediate precision) vs. SSE3 instructions on MacOS (using 32-bit or 64-bit for single-precision and double-precision, respectively) is much more profound. The latter is actually closer to source precision than the former.
As such, I dont think disabling vectorization is the solution, unless the model is numerically unstable or truly exposes a bug. Getting different answers is not necessarily bad. The real question here is which answers are more correct
Aart Bik
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your remark about using small constants, presumably in double precision context, brings up the possibility of latent bugs. You should make certain that your constants are typed correctly. Single precision constants in double precision context may not behave the same in x87 and SSE2 code. They may just happen to work the way you intended in one case, but not in the other. There may be compiler options to promote such constants, but it would be better to make sure the source code is correct.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
RE: small constants involved
I have seen cases in my code where an expression using REAL(8)'s combined with constants without D (e.g. 0.1 as opposed to 0.1D0) depreciate the computation to REAL(4). In any event, even if the current compiler does not depreciate the expresssion small constants, without the D, and where the small constant is has repeating binary fraction, such as 0.1, will cause significant less precision over usingthe REAL(8) constant0.1D0.
Check the code where you are observing the problems and verify if the constants have repeating binary fractions.
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page