Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

what compiler flags to use

mattrosing
Beginner
955 Views
Hi, I used the 8.1 compiler with the following flags (on an Itanium):

-O3 -fno-alias -mP2OPT_hlo_loadpair=F -mP2OPT_hlo_prefetch=F -mP2OPT_hlo_loop_unroll_factor=2 -mP3OPT_ecg_mm_fp_ld_latency=8 -opt_report_fileopt_report -opt_report -opt_report_phase ecg_swp -ivdep_parallel -i4

and had my code running at 20% of peak performance as measured using the hardware counters. If I just used -O3 or did anything but remove the report generation I just made the code worse.

We've now moved to the 9.0 compiler and with the same flags I get 11% of peak performance. Can you recommend some different flags?

0 Kudos
3 Replies
Ron_Green
Moderator
955 Views

Matt,

Yes, the compiler options you show are the 8.1 "linpack 100" set of options. These may not be the best choice for 9.0. What does this set give you:

-O3 -fno-alias -ip

We're also curious why you've chosen an old 9.0 compiler instead of the 10.0 or 9.1 compiler. We've put a lot of work into 9.1 and 10.0 on Itanium optimization, particularly in 10.0. I would really recommend trying the 10.0 compiler. As you know, we have free evaluations and you need not be root to install the compiler (you can install an eval into your home directory for example).

The "black belt" options are very very version specific. In each major version the cost models and algorithms used by the optimizer change.So it is not advised to use the "black belt" options from an older version with a newer version. I'd start with the minimal optimization switches shown above, then one-by-one you can try some of the old BB options.

If you can send us the code, open up an issue on Premier.intel.com. Include the code, the input deck(s), and expected performance criteria in the tarball. We do have access to these older 9.0 compilers, the 9.1 and the 10.0 compilers. Our TCEs would be happy to take a look. Add a note to the issue to "please forward this issue to Ron Green" and I'll get one of our IPF pros to have a look.

ron

0 Kudos
mattrosing
Beginner
955 Views
Hi Ron,

Oops, my mistake. We are using 9.1.

I went back to the 8.1 compiler and got it to work but would be very interested in having someone look at it so I know what to do in the future.

I did try -O3 -fno-alias, as well as many other options. I'm not sure what -ip does. Anyway it was always running around 12-13% of peak until I put in the magic. I've seen many other people use those directives and a gain of 30% is typical. Right now our performance is nearly 25%. My understanding is that the -mP2OPT_hlo_loadpair=F -mP2OPT_hlo_prefetch=F options are useful when a block of data is cache resident, which is what we spent a lot of time making sure happened. We had to change the algorithm, so a compiler would never find it. We also left in -mP2OPT_hlo_loop_unroll_factor=2 but had to remove the -mP3OPT_ecg_mm_fp_ld_latency=8 flag.

I don't know when we'll go to the 10.0 compiler. Our system is complex enough that just swapping compilers is non-trivial.

It would be a lot simpler to just send you a few files that are the core of our code, rather than send you everything. Is that OK? If it's easier, you can email me at rosing at peakfive d0t com

0 Kudos
TimP
Honored Contributor III
955 Views
-ip performs inter-procedural analysis (like auto-inline) within the source file. Much of that occurs by default in current compilers.
If loadpair=F helped with the 8.1 compiler, it looks like a deficiency which we would like to hope could have been corrected in later compilers. That option, which is recommended only for people who are willing to study the effects carefully, would prevent the compiler from trying to optimize with double-width loads. Contrary to your statement, loadpair should be particularly effective for in-cache data.
As you stated, disabling prefetch makes sense if data are known to be retained in cache.
You should be comparing opt_report analyses if you want to use such options.
0 Kudos
Reply