- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello everyone,
I have created a resource for users to recommend compile options as follows (It is still before distribution).
If there is a part that needs improvement or a part that I am missing, please let me know.
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I like to consider more consistent pairs of options between ifort and gfortran.
If I set -ffast-math I will include -fcx-limited-range until I have tested complex data types thoroughly to see that it is not needed for safety and hurts performance. I would never set ifort -fast; if I want -no-complex-limited-range I will set it individually. -fast is only for benchmarks which don't permit setting a full set of command line options.
PIC isn't an optimization flag, it's required (or not) independent of optimization. Fortunately, it no longer has frequent performance implications as it did at one time (when it could hurt performance).
When setting AVX512, I would set -align array64byte.
I would always set -fprotect-parens or -standard-semantics for -O2 or -O3. gfortran never ignores parentheses. In case you're interested, gcc/g++ do allow such violations of standard when -ffast-math is set. Don't let people tell you this is unimportant with properly written source code.
-O2 with ifort includes automatic simd vectorization, I would compare that with gfortran -O2 -ftree-vectorize . Even with that option for gfortran (which some experts recommend highly), you would still need -fp precise to make ifort consistent. no-fast-math and -fp precise -protect-parens avoid optimizations which are expected to involve minor changes in numerical behavior.
If you want optimization without auto-vectorization, -O2 is commonly used with gfortran; -O1 might be better than -novec with ifort.
As ifort options -O2 and O3 include automatic unrolling, for equivalent optimization you should be comparing ifort -unroll4 against gfortran -funroll-loops --param max-unroll-times=4 .
At one time, while I was working at Intel, they said they would institute options to make it easier to match typical gfortran option groups, at least -O2. I guess they decided it's not practical. If you're making the effort to resolve this with a table such as you propose, I would like to take the additional steps to have comparable settings as much as possible.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I like to consider more consistent pairs of options between ifort and gfortran.
If I set -ffast-math I will include -fcx-limited-range until I have tested complex data types thoroughly to see that it is not needed for safety and hurts performance. I would never set ifort -fast; if I want -no-complex-limited-range I will set it individually. -fast is only for benchmarks which don't permit setting a full set of command line options.
PIC isn't an optimization flag, it's required (or not) independent of optimization. Fortunately, it no longer has frequent performance implications as it did at one time (when it could hurt performance).
When setting AVX512, I would set -align array64byte.
I would always set -fprotect-parens or -standard-semantics for -O2 or -O3. gfortran never ignores parentheses. In case you're interested, gcc/g++ do allow such violations of standard when -ffast-math is set. Don't let people tell you this is unimportant with properly written source code.
-O2 with ifort includes automatic simd vectorization, I would compare that with gfortran -O2 -ftree-vectorize . Even with that option for gfortran (which some experts recommend highly), you would still need -fp precise to make ifort consistent. no-fast-math and -fp precise -protect-parens avoid optimizations which are expected to involve minor changes in numerical behavior.
If you want optimization without auto-vectorization, -O2 is commonly used with gfortran; -O1 might be better than -novec with ifort.
As ifort options -O2 and O3 include automatic unrolling, for equivalent optimization you should be comparing ifort -unroll4 against gfortran -funroll-loops --param max-unroll-times=4 .
At one time, while I was working at Intel, they said they would institute options to make it easier to match typical gfortran option groups, at least -O2. I guess they decided it's not practical. If you're making the effort to resolve this with a table such as you propose, I would like to take the additional steps to have comparable settings as much as possible.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Tim,
Thank you for your answer.
First of all, thank you for talking about the effects of the options and additional new options.
Also, I didn't think of a complete match between gfortran and intel fortran, and it was intended to guide end-users.
After asking the compiler forum, I will get a guide once in the HPC forum.
I modified it reflecting your comments, so if you have any problems, please take a look and leave a comment.
Thank you again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suggest you follow Tim's advice more closely and make the distinction between:
"Performance Optimization" -O3 with shortcuts and lesser precision (change in order, no-protected parens, inverse multiply, lesser precision intrinsic functions)
and
"Performance Optimization" -O3 without shortcuts and full precision.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Jim,
Thank you for your suggestion.
Does your suggestion mean adding more options to O3 as shown below to reduce the drawbacks?
What I don't know is the option to get rid of the shortcut and get the full precision.
getting the full precision: -fp precise option added
without shortcut: Please tell us the option or keyword for this.
Thank you
Kihang
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The problem with your "simplified table" some programmers will want
Performance regardless of accuracy and reproducibility
And other programmers
Performance together with accuracy and reproducibility
Your table shows only one option
See Steve Lionel's post: https://software.intel.com/sites/default/files/managed/95/69/Improving%20Numerical%20Reproducibility%20in%20C%2B%2B%20and%20Fortran.pdf
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page