Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

ICC vs GCC vs LLVM/Clang

dpeterc
Beginner
4,186 Views

The "conventional wisdom" was that icc was best by large margin (both as code size and speed), gcc most widespread and multiplatform, and Clang immature, but promising. Something along those lines:

http://www.hortont.com/blog/icc-and-mandelbrot/

But I have recently tested those compilers on my project (about 120k lines of C) on OpenSUSE 12.2, and things have changed radically. GCC 4.7.1 is on pair with icc 12.1.5, while Clang is approximately 25% slower. But Clang has excellent compile errors and warnings, and its static analysis is just superb. So some projects are switching from gcc to clang as default compiler.

Has anyone recently (in 2012) done any serious benchmarking of these compilers? Or can you share the benchmark tests of your production code? Which compile options give you best results? What is your justification for using icc, now that free compilers have improved so much?

0 Kudos
25 Replies
TimP
Honored Contributor III
3,759 Views
icc has become more dependent on pragmas to keep ahead of gcc. Evidently, we know nothing of your code so as to suggest which pragmas, or to suggest equivalent compile command line parameters. With use of pragmas, in my experience icc can vectorize more loops effectively than gcc, and maintain optimization of loops which don't vectorize effectively. There are significant changes in pragma usage for 13.0, not yet fully documented. Typical roughly equivalent (aggressive) command lines: gcc -std=c99 -O2 -ftree-vectorize -unroll-loops --param max-unroll-times=4 -march=native -ffast-math icc -std=c99 -ansi-alias -O3 -complex-limited-range -xHost I can't answer your question about whether adding pragmas so as to get full performance from icc is "justified." or whether you can "justify" use of icc when you don't make that effort, unless you are lucky enough to have an application which comes out ahead with no effort.
0 Kudos
SergeyKostrov
Valued Contributor II
3,759 Views
Hi everybody, ... But I have recently tested those compilers on my project (about 120k lines of C) on OpenSUSE 12.2, and things have changed radically. GCC 4.7.1 is on pair with icc 12.1.5, while Clang is approximately 25% slower. But Clang has excellent compile errors and warnings, and its static analysis is just superb. So some projects are switching from gcc to clang as default compiler. ... [SergeyK] I didn't have a chance to work with Clang C++ compiler. However, a Warning Level 5 '/W5' of Intel C++ compiler is awesome. When I turned it on for a middle size C/C++ project a couple of hundreds issues in the source codes were detected ( to be honest I was simply overwhelmed and it took a couple of weeks to go through all of them ). ... Has anyone recently (in 2012) done any serious benchmarking of these compilers? Or can you share the benchmark tests of your production code? Which compile options give you best results? ... [SergeyK] During last a couple of months we had a couple of short discussions on different forums, like TBB or Software Optimization, about benchmarking and I'd like to repeat that it depends on many factors, like: - project or algorithm - if some threading is used - optimization options - data type ( single- or double-precision ) - CPU ( SSE / SSEx.x / AVX ) - etc For example, I tested a MergeSort algorithm and Intel C++ compiler v12.x gave the best results when all optimizations were disabled (!). In a more complex case, with a Strassen Heap-Based algorithm for matrix multiplication, a ~12-year old Borland C++ v5.x outperformed (!) modern C++ compilers ( Intel / MS / MinGW ) with one thread and a single-precision data type, but it "lost a battle" for a double-precision data type. I have a task of testing a Linpack 100x100 Benchmark in C/C++ for PC on my list for a long time and I hope that I'll be able to allocate some time and complete it. Best regards, Sergey
0 Kudos
dpeterc
Beginner
3,759 Views
Helo Tim, I do not use pragmas, and I am not sure whether I want to in the first place. I prefer having a source which is not adaped to a particular compiler, for me, guessing best compiler switches for each compiler is hard enough. Maybe if my code would have a single hot spot, some matrix multiplication or one well defined algorithm bottleneck, it would be worth the effort. I am sure pragrams are valid tool in some situations, but not for me. Just yesterday I discovered the -flto switch in gcc 4.7.1. Link time optimization is roughly similar to -ipo in Intel's compiler. This gave me the extra 5% speedup, and gcc was marginally faster in the end.
0 Kudos
dpeterc
Beginner
3,759 Views
Hi Sergey, Thank you for your comments. Please post your benchmarks, once you find times to make them, or links to relevant sites which contain solid benchmarks, in your opinion. Best regards, Dušan
0 Kudos
SergeyKostrov
Valued Contributor II
3,759 Views
dpeterc wrote:

Hi Sergey,
Thank you for your comments. Please post your benchmarks, once you find times to make them, or links to relevant sites which contain solid benchmarks, in your opinion.

Best regards,

Dušan

Hi, I'll do it as soon as I receive a new system with a 64-bit Windows...
0 Kudos
nedo_n_
Beginner
3,761 Views

Hi,

I'm using GCC 4.8.1 compiler and Intel ICC 13 compiler. I'm using Ubuntu Server 12.04 64 bit on Intel Sandy Bridge core i7-3930K. I found that with these command line g++ -O3 -march=core-i7 -mtune=core-i7 -mavx and icpc -O3 -xAVX, GCC produces superior performance w.r.t. to ICC about 15%.

I'm amazing about the results, probably I'm forgetting something optimization flag for Intel compiler. Otherwise it's great advance of GCC.

Best Regards

0 Kudos
TimP
Honored Contributor III
3,761 Views

icpc 14.0 (recently completed beta, release expected in a few weeks) gains optimizations for * __restrict (no longer depending on #pragma ivdep) so as to match performance of g++.

You do need -ansi-alias option (equivalent to g++ default -fstrict-aliasing) for icpc to be competitive.  That would not become a default for linux within the next year (and is not under consideration for Windows default).  Except for that, the more aggressive default optimizations of icpc are considered a sales point.  By using the (somewhat complicated) options for unrolling in g++ you could gain advantages over icpc for additional cases.

0 Kudos
SergeyKostrov
Valued Contributor II
3,761 Views
>>... I found that with these command line g++ -O3 -march=core-i7 -mtune=core-i7 -mavx and icpc -O3 -xAVX, GCC produces >>superior performance w.r.t. to ICC about 15%... I simply would like to note that with the following set of command line options: /O3 /Ob1 /Oi /Ot /Oy /GF /MT /GS- /fp:fast=2 /Zi /Gd /Qfp-speculation:fast /Qopt-matmul /Qstd=c++0x /Qrestrict /QxAVX /Qunroll:4 /Qopt-block-factor:64 /Qopt-streaming-stores:auto Intel C++ compiler outperforms MinGW ( GCC-like for Windows ), Microsoft and Borland C++ compilers for more than 50%. However, our statements are very fuzzy because we do not provide any complete test cases to reproduce results by somebody else and these comparisons will never end until both compilers exist.
0 Kudos
TimP
Honored Contributor III
3,761 Views

Sergey, your option set is more complicated than g++ ! 

gcc/g++/gfortran equivalent of /fp:fast=2 is -ffast-math.   These options imply /complex-limited-range (-fcx-limited-range) and have some limitations on handling of division/sqrt.  They can break many applications.

gfortran equivalent of /Qopt-matmul is -fexternal-blas (can use MKL, ACML, libblas et al.)

gcc/g++/gfortran equivalent of /Qunroll:4 is -funroll-loops --param max-unroll-times=4

As pointed out earlier, Intel compilers under VS GUI default to /Qipo, which is equivalent to gnu -flto.  This could invalidate benchmarks such as the one I quote below.

I mention these as it's entirely possible to use consistent options to compare the compilers.  You appear to agree with me on the fallacy of comparing with the simplest possible (inconsistent) option settings.

I extended the benchmarks at https://sites.google.com/site/tprincesite/levine-callahan-dongarra-vectors to include Intel(r) Cilk(tm) Plus (not that I expect anyone to be much impressed).

Most cases perform essentially the same with most of the compilers tested.  Claiming some score such as some compiler is 10% better on geometric mean basis isn't very meaningful.  Excluding non-vectorizable cases boosts the relative score of Intel compilers.

0 Kudos
SergeyKostrov
Valued Contributor II
3,761 Views
>>...Claiming some score such as some compiler is 10% better on geometric mean basis isn't very meaningful... I agree with that and this is exactly what I wanted to demonstrate: nedo n made a claim that GCC outperformed ICC by 15% and did Not provide a set of command line options used to test performance of Intel C++ compiler. Sergey Kostrov made a claim that ICC outperformed MinGW, MSC and BCC by 50% and did Not provide a set of command line options used to test performance of MinGW, MSC and BCC C++ compilers. Also, both of us did Not provide any details about test cases or algorithms used to evaluate performance of all these C++ compilers. Is there any sense in our statements? I do not think so.
0 Kudos
nedo_n_
Beginner
3,761 Views

Hi,

I provided a minimal options for both compiler g++ -O3 -march=core-i7 -mtune=core-i7 -mavx and icpc -O3 -xAVX. I wonder to know if Intel compiler is less performing with these minimal options.

Thank you

0 Kudos
TimP
Honored Contributor III
3,761 Views

nedo n. wrote:

I provided a minimal options for both compiler g++ -O3 -march=core-i7 -mtune=core-i7 -mavx and icpc -O3 -xAVX. I wonder to know if Intel compiler is less performing with these minimal options.

The absence of unrolling options or one which permits vectorized sum reduction will frequently pose a significant handicap for g++, as will the absence of -ansi-alias for icpc.  These evidently would affect different cases.

For g++, the option -march=native could be used to replace all the core-i7 and avx options.  There's probably no need to duplicate options under -mtune which are given already in -march.

0 Kudos
SergeyKostrov
Valued Contributor II
3,761 Views
>>I provided a minimal options for both... This is still Not a right way of evaluating performance because both C++ compilers have different sets of Default options and in case of GCC compiler these Default options could be more aggressive (!) in terms of optimizations.
0 Kudos
nedo_n_
Beginner
3,761 Views

Hi Sergey,

please can you provide a set of best options for both compiler so I can do a fair comparison?

Best Regards

0 Kudos
Finn_M_
Beginner
3,761 Views

You do need -ansi-alias option (equivalent to g++ default -fstrict-aliasing) for icpc to be competitive.  That would not become a default for linux within the next year (and is not under consideration for Windows default).  Except for that, the more aggressive default optimizations of icpc are considered a sales point.  By using the (somewhat complicated) options for unrolling in g++ you could gain advantages over icpc for additional cases.

0 Kudos
SergeyKostrov
Valued Contributor II
3,761 Views
>>... can you provide a set of best options for both compiler so I can do a fair comparison? I've already posted a set of compiler options for Intel C++ compiler and it is impossible to match these command line options to GCC compiler. I use the following set of command line options for GCC-like compilers ( for Release ): -O3 or -O2 -m32 or -m64 -m[ instruction set ] -ffast-math -Wuninitialized -fomit-frame-pointer -DNDEBUG -o -Xlinker --stack=268435456 It is always a challenge when comparison of performance of C++ compilers needs to be done because all of them have unique features. There are always performance differences and, as I've mentioned that many times in the past, if it is less then 5% it could be neglected. However, a greater number is always a concern. PS: It is almost like what car is better? You know that there are too many things which need to be taken into account...
0 Kudos
SergeyKostrov
Valued Contributor II
3,761 Views
I also would like to follow up on the following statement: >>...But Clang has excellent compile errors and warnings... In case of Intel C++ compiler use /W5 and /Wcheck options and if they never were used on a project ( for example ~100K C/C++ code lines and more ) a developer could be overwhelmed with the number of hundreds and hundreds ( if not thousands ) of warning and diagnostic messages.
0 Kudos
dpeterc
Beginner
3,761 Views

My ICC version on Linux is 12.1.5, and -W5 does not work, documentation only mentions -Wremarks, but it does nothing on my code.

I also have ICC on OSX, version 11.1, and same compile options as on Linux produce a lot of useful remarks, just like you mention for -W5. I have spent a week cleaning my code ;-(

It is strange that slightly different version of same compiler produces very different level of warnings and remarks.

Anyway I must say that icc, while being powerful, also requires a lot of "babysitting", with each new release, you must study the options, default options for O1, O2, O3 change. I could run gcc, upgrading Linux with new versions, for years without knowing that much about compiler options. On icc, if you want good results, you need to study and try the options. It is especially difficult since the number of options is very high, some cause compiler to fail or take a very long time to compile, and make very big executable. And doing run time tests is difficult, since the particular set of compile optimizations may benetif your version of CPU, but not the one your customer is using.

0 Kudos
TimP
Honored Contributor III
3,761 Views

If you're looking for inconsistencies among version of gcc, they're not difficult to find.   People still use versions where defaults differ from current gcc.

-fprotect-parens was a default for one major gcc version, regardless of whether -ffast-math was set, but afterwards only gfortran stayed that way.

-finline-functions used to be in effect by default only for -O3, and it used to inline only functions which appeared earlier in the file.

-ftree-vectorize was not always implied by -O3.

-fstrict-aliasing used not to be a default, but has been one for years now.  icc will not make such a change for another year at the earliest, and then it would differ between linux and Windows.

People still get hung up over gcc -m32 defaulting to i486, so the change to icc defaulting to -msse2 some time ago made sense.

0 Kudos
SergeyKostrov
Valued Contributor II
2,756 Views
>>...On icc, if you want good results, you need to study and try the options. It is especially difficult since the number of >>options is very high... Please take a look at a post: Forum Topic: Evolution of Intel C++ compiler options - v7.1 -> v8.1 -> v12.0 -> v13.0 Web-link: http://software.intel.com/en-us/forums/topic/456342 and I agree that lots of time needs to be spent on learning and testing in order to achieve as better as possible results.
0 Kudos
Reply