We are just starting to use the Intel C++ Compiler—we were previously using VS 2008, but I switched to use VS 2013 instead (having it upgrade our project files automatically), and then switched to “Use Intel Compiler”. So far I have not tweaked any options at all, but everything compiles/runs nicely.
The main issue so far is that the binaries (DLLs and EXEs) are much larger when compiler with the Intel compiler. Our large DLLs (that does most of the work) tends to be about twice as big with Intel, whereas a small utility EXE program that calls the DLLs can be almost 6 times larger—it grows from 75,776 bytes to 436,736 bytes.
I have checked the “obvious” possible reasons—both versions link dynamically as far as I can see and there shouldn’t be any debug info in either file. I first thought that the Intel compiler might add debug symbols to the EXE anyway, but after having the linker generate a MAP file, it looks as if the CODE segment is much larger.
So—does anyone have a feeling for what might cause this difference in size? I have tried changing some of the options, but none seems to change the size of the output.
I’ve browsed through the forum (and the rest of the net as well), looking for solutions, but none of the possible causes (certain options) seems to apply here.
Also, the out-of-the box performance if Intel is about 2.5% slower than VS 2013—but this is before applying any Intel-specific optimizations. Our app does not do much number-crunching, though—so I’m not sure if we can benefit much from features such as vectorization…
The Compiler parameters in VS looks like:
/GS /W3 /Gy /Zc:wchar_t /I"..\..\Include" /I"..\..\ Base\Include" /Zi /O2 /Ob1 /Fd".\Release/" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_VC80_UPGRADE=0x0600" /D "_MBCS" /GF /Zc:forScope /Gd /MD /Fa".\Release/" /EHa /nologo /Fo".\Release/" /Qprof-dir ".\Release\" /Fp".\Release/Dump.pch"
What is the purpose of /Zi if you don't want debug symbols? I don't know if the past reasons are still in effect why ICL had to put the symbols in .obj (and CODE?) where msvc would generate a separate .pdb.
If you have auto-vectorization in one build but not the other, that is an "obvious" reason for differences. VS2013 is not aggressive on auto-vectorization, in the absence of /arch: setting. ICL options such as /Qvec- or /O1 will stop auto-vectorization.
It's true that some integer vectorization optimizations depend on AVX2. If your auto-vectorized loops are too short, the default vectorization would be counter-productive.
I don't know how much impact it would have on code size, but msvc defaults to /fp:precise while ICL defaults to /fp:fast which is more aggressive than any option available in msvc. msvc /fp:fast resembles ICL /fp:source.
I’ve removed the /Zi flag (set the ’Debug Information Format’ option to ‘None’), but there is no difference in the size of the generated EXE, it is still almost 6 times larger than the VC++-compiled EXE. I’ve also added the /OPT:REF option to the linker to have it remove unused code—but that doesn’t help either. I haven’t (yet) introduced any “advanced” vectorization options, but maybe this is done by default in some way.
If the reason for the size increase is that the Intel Compiler adds debugging symbols even though I ask it not to—is there some way to remove it from the binary, like the ‘strip’ command on Linux/gcc?
One possible reason is that the Intel compiler by default does much more aggressive inlining than Microsoft.
You may want to play with the /Ob switches to confirm if this is causing the difference -- i.e. does using /Ob0 cause the executables to become much smaller? Of course inlining less will mean slower performance...
Thanks for the hints—I’ve played with the options, and if I set the /Ob0 flag for my EXE and for the .lib projects that it uses, the size of the EXE goes down from 426 kB to 104 kB. So that is good—but it is still 40% larger than what the Microsoft C++ produces, and about 3% slower.
I'm not surprised about the larger exe but slower is not good. I'm not an optimization/tuning expert but I think one of the first suggestions we give customers is to try the /fast flag.