Optimization Flags for non-Intel CPUs

apredeus · ‎04-22-2011

Hello all,

I've found myself in a situation when I have to use Intel/11.1 compilers to build our (mostly) Fortran90/95 simulation package, while using it on AMD-based cluster.

Now, when we use Intel CPUs, compiler flags for ifort are

-132 -w95 -cm -align all -heap-arrays 256 -c -O3 -mp1 -axOW -free

from what I understand, both -mp1 and -axOW flags are chip-specific.

What can I use on Opteron? From what I understand -mp1 is designated to not allow precision loss at -O3 while axOW generate chip-specific optimizations (correct me if I'm wrong!). I do care more about the first then the second one, but having a faster executable is also nice.

Any input would be appreciated.

TimP · ‎04-22-2011

-mp1 has been obsolete since ifort 10.1, but it wasn't "chip-specific" (if I understand your meaning), beyond the extent to which its effect differed between x87 and SSE code. A better option would be -assume protect_parens. An option which lumps together all correctness options would be -fp-model source.
I would use -assume protect_parens -prec-div -prec-sqrt, all of which are included in -fp-model source. Depending on your application, you may or may not see any difference between those choices.
The -axOW combination doesn't make sense, besides being obsolete. According to my understanding, it requests AMD-compatible SSE3 when running on an Intel SSE3 CPU; otherwise, use SSE2. If you will always run on SSE3 compatible CPUs , you may want -msse3; if you want to support CPUs without SSE3 (including early Opteron), you would set the default (-msse2).
To choose SSE3 at run time for Intel SSE3 CPUs, while running SSE2 on Opteron (close to the effect of what you quote), you would set -axSSE3. If the compiler doesn't see a gain for SSE3 in a given subroutine, it should generate the same code as -msse2.

apredeus · ‎04-22-2011

Super helpful, thanks a lot!

apredeus · ‎04-22-2011

One more question

does it make sense to use the correctness options and -msse3 in the subroutines that aren't as agressively optimized? Some subroutines are compiled with -o0 and -o1.

TimP · ‎04-22-2011

There's no problem in setting correctness options along with lower optimization levels, even though they don't make as much difference (probably none at -O0).
Likewise, I wouldn't expect sse3 to come out significantly different from sse2 at -O0 or -O1, but it's no problem if you specify it. When I set -O1 in order to limit code size and compilation time, I would avoid a dual path option such as -axSSE3, but the current compilers may be good enough that it's not a problem.
The -noftz implied by -fp-model source has effect only in the main program, so you may want to use (or not) that same option everywhere.