When upgrading from version 11 to 13 I notice a big performance loss (3x).
Versions in use:
- Composer XE 2011 update 11 build 344
- Composer XE 2013 build 089 (we tried a new one, but same result)
for the test, both using from visual studio 2008.
I could create a relatively small test program - which I could provide to an Intel engineer - but I can not post it to the public forum.
The code is mainly hand written SSE code using F32vec4, and intrinsicts, a little bit of log.
The runtime are worse with /Qipo, also on 2011.
- compiler 1210, /Qip: ~0.42 sec
- compiler 1210, /Qipo: ~1.30 sec
- compiler 1300, /Qip: ~1.20 sec
- compiler 1300, /Qip: ~1.70 sec
Any suggestions, ideas or magic compiler options that coould help?
One (or maybe the only) reason for the difference seems to be inlining - the code contains (for "historical reasons") quite a lot of__forcelines. At least for this one example, removing one of them helps to get equivalent speed for the two compilers.
Some more Information:
this loss of performance is effecting several algorithms in use. For two of them I have undertaken some experiments.
Intel 11, with lots of forceinlines : 630, 137 (throughput of the two algorithms)
Intel 13, with lots of forceinlines : 500, 30
Intel 11, without forceinlines: 650, 98
Intel 13, without forcelinines: 650, 77
Intel 13, without forcelines, enable interprocedureal optimizsation: 620, 95
So I still can not reach to original speed, but at least the loss of speed is no longer so dramatic.
As far as I remember long back ago there have been some options to control the heuritics for inlnining - are there still some of those options available? Or any other hints?
It seems inlining is broken under certain circumstances with ICL 13.0. I experienced the same thing while testing ICL 13.0 for our uses. Massive performance drops compared to MSVC 2012, due to lack of inlining small template functions in hot loops.