IVF optimized code portability

rkbhai · ‎04-27-2004

Hi,

I would like to know which of the following optimization flags on IVF compiler are not portable when my application runs on AMD (HT-enabled) processors?

in other words, how well a very-higly-IVF-optimized parallel application run on equivalent AMD processors? I would like to optimize my parallel application to the highest degree possible, and still be able to run it on any x86 based processors (like AMD) with similar performance.

any comments in this regardare highly appreciated.

IVF (Windows) flags :

1. /O3 with /QaxN or /QaxB

2 /Qipo for multifile ipo optimization

3. /G7 for processor scheduling

4. /QSaclar_rep for scalar representation, possibly with /O3

5. /Qunroll - loop unroll

6. /Qparallel - for auto-parallelization

7. /Qopenmp

Thanks,

Rajesh

TimP · ‎04-27-2004

The documentation is supposed to answer these, so maybe it could use improvement. First off, there is no HT, in the Intel sense,on AMD, in spite of what ZDnet said this week about how AMD had to invent hyperthreading before they could bring Opteron to market. I never heard of anyone disabling hypertransport on AMD. Maybe that's why marketeers from either company don't like it used as an abbreviation. Sorry I don't feel like looking to see if this site has a trademarker or auto-disclaimer, sorely wanted as it may be here.

/O3 doesn't have any architecture dependence

/QaxN and /QaxB are intended to support all compatible processors, but make a separate code path including SSE/SSE2 only for processors recognized as Intel with support for SSE/SSE2. So, the code path taken on AMD is not usually as well optimized as the one for the specific target. /QaxB will not generate separate code paths as often as /QaxN.

/Qipo isn't really architecture dependent, although it could do some optimizations which have more value on Intel than AMD.

/G7 probably has no effect on ifort 8. On earlier compilers, which supported P-III more directly, /G7 would avoid scheduling code in a way which was better for P-III than for P4. Some of what /G7 did was good for AMD, some not.

Scalar replacement is a good optimization for any architecture. I haven't seen it used much explicitly. It is likely to have more effect on Netburst processors.

/Qunroll also is rarely used by itself.Intel compilers normally make good choices of unrolling for optimization by default, unlike gcc/g77 which doesn't unroll without you asking. If you know that your loops aren't long enough for unrolling to be useful, you can use /Qunroll to turn off unrolling. Not architecture dependent.

/Qparallel wouldn't help you on a single CPU AMD, where it might help somewhat on a single Hyperthreaded Intel CPU. If it does anything, it could be more helpful on a dual AMD.

The same comments apply to /Qopenmp, except then the parallelization is controlled directly by the OpenMP directives in the source code. Without the directives, /Qopenmp has no effect except to link with threaded libraries, in case your code, or some you link in, such as MKL, will be using them.

rkbhai · ‎04-27-2004

is there a place on the inte-l/r-net where i can find a comparision of IVF and portland group (PGI) compilers' performance on intel and AMD processors for cluster based parallel application development in fortran?

any hints/pointers to original sources will be highly appreciated!

regards,
rajesh

rkbhai · ‎04-27-2004

any comments on comparisions posted at Boston Univ (link below)? They don't say anything on what type of processors they used....

http://scv.bu.edu/SCV/Archive/linux-cluster/compiler-performance.html

regards,

rajesh

Steven_L_Intel1 · ‎04-27-2004

Not much detail on the configurations, or even compiler versions. It indicates the last update was Nov 2003.

You might look at the Linux Compiler Comparisons at http://www.polyhedron.com/

Steven_L_Intel1 · ‎04-27-2004

It's not clear what compilers are being compared on that page, nor what versions or compiler options were used.

http://www.polyhedron.com/ is perhaps more useful. Note that the PGI compiler is only tested on Linux - I'm not sure they've released a Windows version yet.

TimP · ‎04-27-2004

Even on Polyhedron, the compiler version information could be improved. The Intel 8.0 may be an early production version. In most cases they give more information than this. The PGI version is about to go out on beta test,and is advertised to be much faster than the current version. On Polyhedron you do see the options used, the source code, and, within a few months, the compiler version.