Mac Fortran 9.1 - disabling vector/cpu optimizations?

carlgt1 · ‎03-15-2007

my project (http://climateprediction.net) uses large climate model systems in a perturbed-physics distributed computing experiment (along the lines of SETI@home). The models have run fine with "-O3" optimization but without any CPU-level (i.e. sse2/sse3) and vectorizing optimizations.

We were happy to buy the Intel Mac compiler, but although the model is running, the results are different because "plain -O3" (or any -O level) on the Intel Mac compiler seems to have "hard-wired" CPU/vectorizing optimizations. This gives widely different results from the Linux & Win Intel Fortran compilers; although it does make for a fast model run! But it seems as if some of the physics (particularly in the ocean and in the aerosol/sulphate feedback/cooling in the atmosphere) gets "optimized away."

Has anyone come up with a way so that "O3" on the Mac doesn't automatically turn on various CPU/vectorizing options; as O3 on Win & Linux doesn't force those options on.

Thanks!

Steven_L_Intel1 · ‎03-15-2007

On Mac, there is an implied -xP switch which enables vectorization. There is no supported way to turn this off. You can use !DEC$ NOVECTOR before sensitive loops to disable vectorization.

In a future release, the 64-bit Linux and Windows compilers will assume -xW, since all of those processors support SSE2, so they too will see vectorization where there had not been any before.

jimdempseyatthecove · ‎03-15-2007

Vectorization is one issue, while you are looking at the code make sure the default REAL size is what you assume it is. REAL(4) versis REAL(8) makes a significant difference in both precision and performance. The default on the Intel compiler is REAL(4). The use of SSE2/3 instructions can loose some precision in intermediary calculations too. I prefer to use SSE3 with REAL(8).

Jim Dempsey

Intel_C_Intel · ‎03-15-2007

Steve,

The xP switch implies that floating-point operations can be implemented with SSE3 instructions for both vector and scalar code. Even though vectorization may change precision (most notorious due to reassociation), I suspect that the differences caused by using generic x87 instructions on Windows/Linux (with 80-bit intermediate precision) vs. SSE3 instructions on MacOS (using 32-bit or 64-bit for single-precision and double-precision, respectively) is much more profound. The latter is actually closer to source precision than the former.

As such, I dont think disabling vectorization is the solution, unless the model is numerically unstable or truly exposes a bug. Getting different answers is not necessarily bad. The real question here is which answers are more correct

Aart Bik

http://www.aartbik.com/

TimP · ‎03-15-2007

A less drastic method for disabling risky optimizations is to set -fp-model precise. I'm a little surprised to see stated that -vec- is not available for the Mac compiler. However, if numerical stability is the issue, vectorization is not likely to be the villain. If you don't want even to go so far as to set -fp-model precise, the options -prec_div -prec_sqrt will prevent the use of non-IEEE compliant methods for accelerating single precision vectorization of those operations. I'd be surprised if your application depends on x87 80-bit intermediate precision, which alsois not available on Windows, unless you use gnu compilers.

carlgt1 · ‎03-16-2007

I should have mentioned we've been using -fp-model strict on the Mac compilations, which lets the model run through (without that it would eventually crash). but there is something being optimized away at the atmos sulphate & ocean coupling stage; probably because models are usually run 64-bit and these procedures have small constants involved, so something is getting lost in the optimization. I'll try some other options; it's easier trying compiler options than trying to rewrite and debug 1.5 million lines of Fortran!

TimP · ‎03-16-2007

Your remark about using small constants, presumably in double precision context, brings up the possibility of latent bugs. You should make certain that your constants are typed correctly. Single precision constants in double precision context may not behave the same in x87 and SSE2 code. They may just happen to work the way you intended in one case, but not in the other. There may be compiler options to promote such constants, but it would be better to make sure the source code is correct.

jimdempseyatthecove · ‎03-16-2007

RE: small constants involved

I have seen cases in my code where an expression using REAL(8)'s combined with constants without D (e.g. 0.1 as opposed to 0.1D0) depreciate the computation to REAL(4). In any event, even if the current compiler does not depreciate the expresssion small constants, without the D, and where the small constant is has repeating binary fraction, such as 0.1, will cause significant less precision over usingthe REAL(8) constant0.1D0.

Check the code where you are observing the problems and verify if the constants have repeating binary fractions.

Jim Dempsey