Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
29280 Discussions

Mac Fortran 9.1 - disabling vector/cpu optimizations?

carlgt1
Beginner
1,018 Views
my project (http://climateprediction.net) uses large climate model systems in a perturbed-physics distributed computing experiment (along the lines of SETI@home). The models have run fine with "-O3" optimization but without any CPU-level (i.e. sse2/sse3) and vectorizing optimizations.

We were happy to buy the Intel Mac compiler, but although the model is running, the results are different because "plain -O3" (or any -O level) on the Intel Mac compiler seems to have "hard-wired" CPU/vectorizing optimizations. This gives widely different results from the Linux & Win Intel Fortran compilers; although it does make for a fast model run! But it seems as if some of the physics (particularly in the ocean and in the aerosol/sulphate feedback/cooling in the atmosphere) gets "optimized away."

Has anyone come up with a way so that "O3" on the Mac doesn't automatically turn on various CPU/vectorizing options; as O3 on Win & Linux doesn't force those options on.

Thanks!

0 Kudos
7 Replies
Steven_L_Intel1
Employee
1,018 Views

On Mac, there is an implied -xP switch which enables vectorization. There is no supported way to turn this off. You can use !DEC$ NOVECTOR before sensitive loops to disable vectorization.

In a future release, the 64-bit Linux and Windows compilers will assume -xW, since all of those processors support SSE2, so they too will see vectorization where there had not been any before.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,018 Views

Vectorization is one issue, while you are looking at the code make sure the default REAL size is what you assume it is. REAL(4) versis REAL(8) makes a significant difference in both precision and performance. The default on the Intel compiler is REAL(4). The use of SSE2/3 instructions can loose some precision in intermediary calculations too. I prefer to use SSE3 with REAL(8).

Jim Dempsey

0 Kudos
Intel_C_Intel
Employee
1,018 Views

Steve,

The xP switch implies that floating-point operations can be implemented with SSE3 instructions for both vector and scalar code. Even though vectorization may change precision (most notorious due to reassociation), I suspect that the differences caused by using generic x87 instructions on Windows/Linux (with 80-bit intermediate precision) vs. SSE3 instructions on MacOS (using 32-bit or 64-bit for single-precision and double-precision, respectively) is much more profound. The latter is actually closer to source precision than the former.

As such, I dont think disabling vectorization is the solution, unless the model is numerically unstable or truly exposes a bug. Getting different answers is not necessarily bad. The real question here is which answers are more correct

Open-mouthed smiley [:-D]

Aart Bik

http://www.aartbik.com/

0 Kudos
TimP
Honored Contributor III
1,018 Views
A less drastic method for disabling risky optimizations is to set -fp-model precise. I'm a little surprised to see stated that -vec- is not available for the Mac compiler. However, if numerical stability is the issue, vectorization is not likely to be the villain. If you don't want even to go so far as to set -fp-model precise, the options -prec_div -prec_sqrt will prevent the use of non-IEEE compliant methods for accelerating single precision vectorization of those operations. I'd be surprised if your application depends on x87 80-bit intermediate precision, which alsois not available on Windows, unless you use gnu compilers.
0 Kudos
carlgt1
Beginner
1,018 Views
I should have mentioned we've been using -fp-model strict on the Mac compilations, which lets the model run through (without that it would eventually crash). but there is something being optimized away at the atmos sulphate & ocean coupling stage; probably because models are usually run 64-bit and these procedures have small constants involved, so something is getting lost in the optimization. I'll try some other options; it's easier trying compiler options than trying to rewrite and debug 1.5 million lines of Fortran!

0 Kudos
TimP
Honored Contributor III
1,018 Views

Your remark about using small constants, presumably in double precision context, brings up the possibility of latent bugs. You should make certain that your constants are typed correctly. Single precision constants in double precision context may not behave the same in x87 and SSE2 code. They may just happen to work the way you intended in one case, but not in the other. There may be compiler options to promote such constants, but it would be better to make sure the source code is correct.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,018 Views

RE: small constants involved

I have seen cases in my code where an expression using REAL(8)'s combined with constants without D (e.g. 0.1 as opposed to 0.1D0) depreciate the computation to REAL(4). In any event, even if the current compiler does not depreciate the expresssion small constants, without the D, and where the small constant is has repeating binary fraction, such as 0.1, will cause significant less precision over usingthe REAL(8) constant0.1D0.

Check the code where you are observing the problems and verify if the constants have repeating binary fractions.

Jim Dempsey

0 Kudos
Reply