Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
7782 Discussions

Worse performance with version 13 - worsens with /Qipo


When upgrading from version 11 to 13 I notice a big performance loss (3x).

Versions in use:
- Composer XE 2011 update 11 build 344
- Composer XE 2013 build 089 (we tried a new one, but same result)
for the test, both using from visual studio 2008.

I could create a relatively small test program - which I could provide to an Intel engineer - but I can not post it to the public forum.
The code is mainly hand written SSE code using F32vec4, and intrinsicts, a little bit of log.

The runtime are worse with /Qipo, also on 2011.
- compiler 1210, /Qip: ~0.42 sec
- compiler 1210, /Qipo: ~1.30 sec
- compiler 1300, /Qip: ~1.20 sec
- compiler 1300, /Qip: ~1.70 sec

Any suggestions, ideas or magic compiler options that coould help?


0 Kudos
2 Replies

One (or maybe the only) reason for the difference seems to be inlining - the code contains (for "historical reasons") quite a lot of__forcelines. At least for this one example, removing one of them helps to get equivalent speed for the two compilers.

Some more Information:
this loss of performance is effecting several algorithms in use. For two of them I have undertaken some experiments.

Intel 11, with lots of forceinlines : 630, 137  (throughput of the two algorithms)
Intel 13, with lots of forceinlines : 500, 30
Intel 11, without forceinlines: 650, 98
Intel 13, without  forcelinines: 650, 77
Intel 13, without forcelines, enable interprocedureal optimizsation: 620, 95

So I still can not reach to original speed, but at least the loss of speed is no longer so dramatic.

As far as I remember long back ago there have been some options to control the heuritics for inlnining - are there still some of those options available? Or any other hints?


New Contributor I

It seems inlining is broken under certain circumstances with ICL 13.0.  I experienced the same thing while testing ICL 13.0 for our uses. Massive performance drops compared to MSVC 2012, due to lack of inlining small template functions in hot loops.