Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Intel Compiler Recommended options

youn__kihang
Novice
2,509 Views

Hello everyone,
I have created a resource for users to recommend compile options as follows (It is still before distribution).

Compile_Option_Recommendation.PNG
If there is a part that needs improvement or a part that I am missing, please let me know.

Thank you.

1 Solution
TimP
Honored Contributor III
2,509 Views

I like to consider more consistent pairs of options between ifort and gfortran. 

If I set -ffast-math I will include -fcx-limited-range until I have tested complex data types thoroughly to see that it is not needed for safety and hurts performance.  I would never set ifort -fast; if I want -no-complex-limited-range I will set it individually. -fast is only for benchmarks which don't permit setting a full set of command line options.

PIC isn't an optimization flag, it's required (or not) independent of optimization.  Fortunately, it no longer has frequent performance implications as it did at one time (when it could hurt performance).

When setting AVX512, I would set -align array64byte.

I would always set -fprotect-parens or -standard-semantics for -O2 or -O3.  gfortran never ignores parentheses.  In case you're interested, gcc/g++ do allow such violations of standard when -ffast-math is set.  Don't let people tell you this is unimportant with properly written source code.

-O2 with ifort includes automatic simd vectorization, I would compare that with gfortran -O2 -ftree-vectorize  .  Even with that option for gfortran (which some experts recommend highly), you would still need -fp precise to make ifort consistent.  no-fast-math and -fp precise -protect-parens avoid optimizations which are expected to involve minor changes in numerical behavior.

If you want optimization without auto-vectorization, -O2 is commonly used with gfortran; -O1 might be better than -novec with ifort.

As ifort options -O2 and O3 include automatic unrolling, for equivalent optimization you should be comparing ifort -unroll4 against gfortran -funroll-loops --param max-unroll-times=4 .

At one time, while I was working at Intel, they said they would institute options to make it easier to match typical gfortran option groups, at least -O2.  I guess they decided it's not practical.  If you're making the effort to resolve this with a table such as you propose, I would like to take the additional steps to have comparable settings as much as possible.

 

View solution in original post

0 Kudos
5 Replies
TimP
Honored Contributor III
2,510 Views

I like to consider more consistent pairs of options between ifort and gfortran. 

If I set -ffast-math I will include -fcx-limited-range until I have tested complex data types thoroughly to see that it is not needed for safety and hurts performance.  I would never set ifort -fast; if I want -no-complex-limited-range I will set it individually. -fast is only for benchmarks which don't permit setting a full set of command line options.

PIC isn't an optimization flag, it's required (or not) independent of optimization.  Fortunately, it no longer has frequent performance implications as it did at one time (when it could hurt performance).

When setting AVX512, I would set -align array64byte.

I would always set -fprotect-parens or -standard-semantics for -O2 or -O3.  gfortran never ignores parentheses.  In case you're interested, gcc/g++ do allow such violations of standard when -ffast-math is set.  Don't let people tell you this is unimportant with properly written source code.

-O2 with ifort includes automatic simd vectorization, I would compare that with gfortran -O2 -ftree-vectorize  .  Even with that option for gfortran (which some experts recommend highly), you would still need -fp precise to make ifort consistent.  no-fast-math and -fp precise -protect-parens avoid optimizations which are expected to involve minor changes in numerical behavior.

If you want optimization without auto-vectorization, -O2 is commonly used with gfortran; -O1 might be better than -novec with ifort.

As ifort options -O2 and O3 include automatic unrolling, for equivalent optimization you should be comparing ifort -unroll4 against gfortran -funroll-loops --param max-unroll-times=4 .

At one time, while I was working at Intel, they said they would institute options to make it easier to match typical gfortran option groups, at least -O2.  I guess they decided it's not practical.  If you're making the effort to resolve this with a table such as you propose, I would like to take the additional steps to have comparable settings as much as possible.

 

0 Kudos
youn__kihang
Novice
2,509 Views

Hello Tim,
Thank you for your answer.

First of all, thank you for talking about the effects of the options and additional new options.
Also, I didn't think of a complete match between gfortran and intel fortran, and it was intended to guide end-users.
After asking the compiler forum, I will get a guide once in the HPC forum.
I modified it reflecting your comments, so if you have any problems, please take a look and leave a comment.

Thank you again.

 

Compile_Options.PNG

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,509 Views

I suggest you follow Tim's advice more closely and make the distinction between:

"Performance Optimization" -O3 with shortcuts and lesser precision (change in order, no-protected parens, inverse multiply, lesser precision intrinsic functions)

      and

"Performance Optimization" -O3  without shortcuts and full precision.

Jim Dempsey

0 Kudos
youn__kihang
Novice
2,509 Views

Hello Jim,

Thank you for your suggestion.
Does your suggestion mean adding more options to O3 as shown below to reduce the drawbacks?
What I don't know is the option to get rid of the shortcut and get the full precision.

getting the full precision: -fp precise option added
without shortcut: Please tell us the option or keyword for this.

Thank you

Kihang

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,509 Views

The problem with your "simplified table" some programmers will want

     Performance regardless of accuracy and reproducibility

And other programmers

     Performance together with accuracy and reproducibility

Your table shows only one option

See Steve Lionel's post: https://software.intel.com/sites/default/files/managed/95/69/Improving%20Numerical%20Reproducibility%20in%20C%2B%2B%20and%20Fortran.pdf

Jim Dempsey

0 Kudos
Reply