Intel C++ compiler produces a HUGE code

meldaproduction · ‎04-01-2015

Hi,

I'm spending way too much time comparing MSVC and Intel C++ compiler. My current results are that MSVC generates sometimes better code, sometimes worse, but if it is better, than it's just a little, but if it is worse, the proportions are worse. Since my code is highly dependent on floating point signal processing, I assume the better vectorization and AVX dispatching could be the reason. So I'm keen on switching to Intel compiler.

BUT MSVC output executables are almost 2x smaller than Intel executables!! To be specific, a big project of mine has 44MB by intel and 24MB by msvc. This makes it rather difficult to justify the rather small difference in performance. I also tried to zip them just to see the difference in entropy and msvc compressed to 5 MB, intel to 6 MB. That looks like there's a lot of redundant data in the executables and I'm thinking if I'm not missing some compiler/linker option to remove these things. Here are my command lines (just options, some of them are probably doing nothing, just borrowed from msvc) :

Compiler:

icl.exe /GR- /bigobj /Ot /Ox /Ob2 /Oy /Oi /O3 /arch:SSE2 /Qvec-report /ansi-alias /Qftz /QaxAVX /Qrestrict /MT /GS- /TP /D_MBCS /Wp64 /c /W3 /EHsc /GF /Gd /Gm- /Zc:forScope /nologo

Linker:

icl.exe /link /INCREMENTAL:NO /RELEASE /MACHINE:X86 /SUBSYSTEM:WINDOWS,5.01 /DLL /nologo /MANIFEST:NO /OPT:REF /OPT:ICF

Thanks in advance!

jimdempseyatthecove · ‎04-01-2015

>>BUT MSVC output executables are almost 2x smaller than Intel executables!!

Is this the file size or the after load footprint size?

MSVC may be relying on code in DLLs, ICL may be inlining code. Example: MSVC may be calling a DLL to perform memcpy whereas ICL may be performing an optimized memory copy loop inline.

Note, you have /Oi set to enable inline of intrinsic functions.

If size is more important than speed, use /O1 and not the other /O... options

On the other hand, what does an extra 20MB mean for a system with a HD of 1,000,000MB or there about.

Jim Dempsey

meldaproduction · ‎04-01-2015

It is the size of the executable (DLL, in my case). It is NOT dependent on any other DLLs, I made sure of that.

I understand the causes of inlining and stuff, but I wonder if there isn't something I forgot. After all I'm using inlining for MSVC as well, as well as intrinsics, basically the whole code is exactly same. In fact I even use the MS specific __forceinline keyword. So I wonder if ICC didn't leave something behind in the executable - say list of symbols or anything.

jimdempseyatthecove · ‎04-01-2015

Produce a map file for both. Check the sizes of the usual segments (.text, .data, .bss).

What does that tell you?

Jim Dempsey

meldaproduction · ‎04-01-2015

Ok thanks, how can I do that?

TimP · ‎04-01-2015

If you were serious about reducing code size, you wouldn't be setting dual code path options such as QxAVX. You simply can't compare code size of a build where you asked for a single path (the only option available for msvc++). If you set /arch:AVX for one, you should set the same for both. If you request both SSE2 and AVX code paths, you must expect a doubling of size for all vectorized code and libraries.

Vectorized loops can account for much of the code expansion you mention. If you want the performance gains of vectorization (as well as in-lining) without increased .dll size, you must determine which parts of your code are benefiting from these optimizations and disable them elsewhere.

If you are concerned about measuring performance implication of larger code, you will need to collect events such as those related to I-cache. It seems unusual that I-cache events could account for as much as 4% of run-time on signal processing or HPC applications, even when no effort is made to control it.

meldaproduction · ‎04-01-2015

Thanks for the info. Here's the problem - I of course thought about the dual paths - the QxAVX increased the size of the executable by about 2MB, which is like 5% and that's indeed irrelevant. Anyway I'll take your advice and won't be bothered by this. I just hope there really isn't a huge pile of some completely rendundant data in the executable.

KitturGanesh · ‎04-01-2015

Hi,
Jim/Tim already responded nicely on your question. The link below should give you more insight as well:

https://software.intel.com/en-us/articles/how-to-compile-for-intel-avx/

_Kittur

JenniferJ · ‎04-01-2015

/Qipo might help with dead-code elimination, but compiler will also do more aggressive inlining too.

are you able to get the map file created? Use "/map" option at the linker.

is there one single file that icl's is much bigger than cl's?

Jennifer

KitturGanesh · ‎04-01-2015

Additionally, a related link on code size optimization read at:

https://software.intel.com/sites/default/files/managed/f4/1d/code-size-optimization-using-icc.pdf

_Kittur

meldaproduction · ‎04-02-2015

I didn't try /map yet, but I tried limiting inline size and I got to the size of MSVC executables. But with that size it was almost 50% slower!! So well, inlining is probably the thing and is needed...