- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For exmaple, simple compilation of the file attached takes about 650 times more than required by gcc:
> time icc ibm2.c
real 1m3.930s
user 1m2.690s
sys 0m0.320s
> time gcc ibm2.c
real 0m0.091s
user 0m0.050s
sys 0m0.040s
This is the latest (11.1.072) version of icc for Linux (64 bit)
> icc -V
Intel C Intel 64 Compiler Professional for applications running on Intel 64, Version 11.1 Build 20100414 Package ID: l_cproc_p_11.1.072
Copyright (C) 1985-2010 Intel Corporation. All rights reserved.
FOR NON-COMMERCIAL USE ONLY
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks!
Dale
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
gcc -O3 -ffast-math -fno-cx-limited-range -fno-strict-aliasing -funroll-loops --param max-unroll-times=2
gcc isn't going to attempt all the optimizations which OP has requested under icc.
icc -O1 would be roughly equivalent to gcc -O2 -ffast-math -fno-cx-limited-range -fno-strict-aliasing
It looks like the original code is not intended for any practical purpose, only to see whether a compiler can be provoked into attempting optimization beyond the bounds of sanity. You can't fault any compiler for taking longer with aggressive options set than gcc takes with optimizations disabled.
We have had continual arguments over whether it should be necessary, but the fact remains that large parts of some commercial applications have to be built with icc cut back to -O1, where gcc options -O3 or -ffast-math would never be considered.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the present case, as well, I don't think that the user, who did not write down any compiler switches explicitly, was conscious about all the optimizations that 'he had requested'. Had he known what they were, and what the likely cost was going to be, he might have looked for switches to generate within 95% of the best optimization possible, or something similar.
However, the fact that the C example is less than 1 kbyte long and took over 1 minute to compile puts new light on the issue. Perhaps, if the back ends of the C and Fortran compilers have some common portions, we shall all benefit when the developers work at it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What will give 95% is highly application dependent.
gcc users are expected to know that they should set switches when they want optimized code.
icc marketers want the compiler to show to best advantage when used by the people who write magazine articles based on trivial benchmarks with default compiler options, and when running SPEC baseline benchmarks. Those goals conflict to some extent with useability, and gcc is in a better position, as it clearly puts those goals at a lower priority than useability.
Another conflict is posed by the desire to have consistent defaults between linux and Windows. It is felt strongly that the icc default must be competitive with the VC default, and include auto-vectorization, since there is no equivalent VC option. The inexplicable part is the decision to make /fp:fast inconsistent with VC.
At one time, there was an effort to persuade Intel compiler to adopt a simple option which would be consistent with typical gcc options like -O2. This failed. The picture has changed now that gcc includes auto-vectorization in -O3. I'd like to see more coverage in the documentation about icc options which are equivalent to options normally used in the reference compiler, but that has become more difficult with the evolution of those compilers to include more optimization.
On the Fortran side, Steve Lionel has agreed that some of the standard compliance options should be used in normal practice, and he has made an effort to consolidate them.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sounds good on the surface, however...
Considering that the incentive for the compiler vendor is produce
Our -On produces faster code than their -On
Then the incentive would be for Vendor's A interpretation of Vendor's B equivilent options to be a selection of options that produces lower performing code.
The comparable options would be best selected from an un-biased (neutral) party.
Please note, the interpretation of -On is
-O0 is no optimization
-O1 is more effort than -O0
-O2 is more effort than -O1
-O3 is more effort than -O2
-Oother is additional features beyond -O3
And the vendor is more or less free to include which optimization features go into what level.
IMHO - it is the users responsibility to select from the available options a set that trades off compile time against performance returned (and quality of results).
It is likely that on rare occasions a given sample code may cause a compiler to choke or produce erronious results. Last week I encountered a problem where g++ could not handle a simple template that both icc and msvc had no problems with.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
-O0 is no optimization
-O1 is more effort than -O0
-O2 is more effort than -O1
-O3 is more effort than -O2
-Oother is additional features beyond -O3
And the vendor is more or less free to include which optimization features go into what level.
IMHO - it is the users responsibility to select from the available options a set that trades off compile time against performance returned (and quality of results).
My problem with Intel's choice of group of optimizations for O0, O1, O2, O3 is not with the effort but with the result.
I mostly get best results with -O1, while higher optimizations take more time to compile and produce code which is both slower and bigger. Or much bigger for negligent speedup.
Like with every toy or gadget, when the initial experimentation phase is over, one sticks with the options which worked best so far, and can't continuosly flip flop n-dimensional space of compiler options in search of a jackpot.
So I wish more time would be devoted to make sure that the optimization of a higer level is actually better, not just a particularly smart way of doing something which sometimes works and sometimes not; it is the programmer's responsability to check.
In comparison, GCC very rarely behaves worse with higher optimization.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This indicates your programs tend to perform better without loop unrolling.A small percentage of applications behave this way.
Try -Os
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As you apparently leave unrolling off when you use gcc, Jim's suggestion may be on target. gcc actually makes it more difficult to get a useful level of unrolling for loop trip counts of 20 or more.
-Os is intended to reduce generated code size at the expense of performance in comparison with -O1. It's possible it may perform relatively well when loop trip counts are 0 to 1.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
$ time icc tstcase.cpp
real 0m33.722s
user 0m33.647s
sys 0m0.053s
$ time ./a.out
count = 991
real 0m0.005s
user 0m0.003s
sys 0m0.001s
$ time g++ tstcase.cpp
real 0m0.091s
user 0m0.061s
sys 0m0.029s
$ time ./a.out
count = 991
real 0m0.019s
user 0m0.017s
sys 0m0.003s
$ uname -a
Linux maya11 2.6.29.4-167.fc11.x86_64 #1 SMP Wed May 27 17:27:08 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
$ icc -V
Intel C Intel 64 Compiler Professional for applications running on Intel 64, Version 11.1 Build 20100203 Package ID: l_cproc_p_11.1.069
Copyright (C) 1985-2010 Intel Corporation. All rights reserved.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have reported the issue to the Intel compiler development team. I will update this forum thread when there is update on this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Rework the main() so it takes an input arg for use as major loop count.
Based on your run times Iwould expect an interaton count of 1000 would produce more meaningful results.
Jim

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page