Community
cancel
Showing results for 
Search instead for 
Did you mean: 
piet_de_weer
Beginner
50 Views

Performance degradation upgrading from 10.1 to 13

Hello,

I've been using the Intel C++ compiler for years and I've always been really happy with the performance. But I've been stuck at version 10.1 for quite a while now. When version 11.1 came out I tried to upgrade, and found that the performance of my software dropped quite a lot - the throughput in the same amount of time was 75% of what it was in 10.1.

So at the time I decided to just stay at version 10.1. Yesterday I decided to give the latest version - C++ Composer 13 - a try. So I downloaded a trial version of Visual Studio 2010 and installed Intel C++ Composer 13. Converted my software, and tried to build it. After fixing a small linking issue it's building fine - and a lot faster than version 10.1. The binary that comes out is also a lot smaller . But... I'm still seeing the same performance drop as with version 11.1!

I walked through the optimization settings, enabled SSE2 generation (same as before), compiled again - no change.

So now I'm wondering: Am I missing something obvious? I don't want to stay at version 10.1 for the rest of my life... Want to be able to take advantage of improvements - it would be so nice to just upgrade to a newer compiler and suddenly see my code run faster instead of slower.

0 Kudos
7 Replies
TimP
Black Belt
50 Views

You may need to profile your application to determine where the performance loss occurs. With more data, we may be able to assist. If you depend on vectorization or in-lining, comparison of opt-report-file outputs may help.
piet_de_weer
Beginner
50 Views

TimP (Intel) wrote:

You may need to profile your application to determine where the performance loss occurs. With more data, we may be able to assist.
If you depend on vectorization or in-lining, comparison of opt-report-file outputs may help.

Since there is not a specific 'hot spot' it must be happening all over the place (either that or very heavily at one spot). The program spends at least half of the time inside IPP, so the actual degradation must be much more than what I'm seeing. A build with version 10.1 with SSE instead of SSE2 is hardly slower than the SSE2 version, but I'll run a build on 10.1 even without SSE - if I remember correctly (it's a long time ago that I checked) it will still be faster than the version 13 build; if not I'll check the vectorization logs. I do see a lot of 'incompatible loop' warnings in v13 so that could indeed have some impact.
piet_de_weer
Beginner
50 Views

Just as some positive feedback: I just compiled the same code with the Microsoft compiler in Visual Studio 2010 and it runs at about 40% of the speed of v10.1.
piet_de_weer
Beginner
50 Views

Performance of version 10.1 with target CPU set to None (so not even SSE support) is roughly identical to that of v13 with SSE2 enabled and vectorization etc. But in V13 I still see vectorizing happen, and having access to the (usually faster) SSE(2) register arithmetic should also help, so there must be more going on than just missed vectorizing opportunities. I'm going to release a new version of my software in a few days so I'll stay with 10.1 for now; will probably look into it later.
SergeyKostrov
Valued Contributor II
50 Views

>>...Performance of version 10.1 with target CPU set to None (so not even SSE support) is roughly identical to that of v13 with >>SSE2 enabled and vectorization etc... It looks like a warning message to a C++ compiler team. My question: Is there some unnecessary overhead in a newer version(s) of the C++ compiler?
jimdempseyatthecove
Black Belt
50 Views

Is your code using mixed precision expressions? (mixture of REAL(4), REAL(8), integer,...) If so, you might want to correct those sections of code. It is somewhat difficult to locate these epressions due to the compiler not having a /warn:promotion diagnostic. Jim Dempsey
piet_de_weer
Beginner
50 Views

Not really, nearly everything in the code is either int or float - and I'm not converting them very often, and definitely not inside loops that are important for performance.
Reply