WeI have been using Intel C++ Compiler version 11 for a long time (on Mac OS), and have been very happy with the performance running DSP code. We are now upgrading our build system, and version 11 is no longer supported on our new Mac.
Unfortunately, upgrading to the latest compiler version (version 13), has a quite bad impact on performance. After some digging it seems like part of this is due to changes in inlining. We have a lot of loops with a bunch of function calls in them, and they no longer inline, even if I use the "static inline" keyword. It seems like I can get around this using the "-ip" flag to the compiler, but still performance is not as good as before.
I used to have the following flags:
-Wno-multichar -Wno-trigraphs -Wall -x c++ -fmessage-length=0 -pipe -fpascal-strings -fasm-blocks -O3 -funroll-loops -funroll-all-loops -fp-model fast -fPIC
I can't use the "-ipo" flag, as I link with other code compiled with other compilers.
Any other things that have changed which can affect performance?
I have been trying to reproduce this in a shortish piece of code, but I can't. It behaves very well when I bring my inner-loop out of that big mess of other code. Any ideas about what aspects of the code could trigger this kind of problem?
After some more experimenting with my complicated file which fails, I can conclude that "static inline" doesn't work, but "__attribute__((always_inline))" does!
However, the resulting binary is still significantly slower than the same file compiled using icc version 11, so I guess this only solves part of the problem.
Somre more info...
If I remove enough random code from my big for-loop, eventually the "static inline" starts to work again! So there seems to be a problem where the compiler turns off inlining (and maybe other optimizations?) if the current scope is too complex or something like that.