Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7942 Discussions

Performance drop 11.1.51 vs 10.0.025

jelle
Beginner
439 Views
I am currently evaluating whether it would be interesting for us to upgrade our license, and tested the performance of our application with the latest version of the Intel C++ compiler. To my suprise I notice a significant drop in performance for the most significant part of our code. Using the 10.0.25 compiler the run time is 5.5 seconds for a partical test data, and 9.2 seconds using the 11.1.51 compiler! For this test I disabled the use of multi CPUs, but our application is threaded by means of standard Windows threads. The same performance difference is present when using all 8 CPU cores of the machine.

The specific code uses basic arithmetics, some multiplications and additions of array elements, in 3 nested loops. So a lot of simple operations, array elements should be cached efficiently. Loop unrolling is set to automatic, and I use the following flags for both compilers:

/c /O2 /Ot /EHsc /MD /GS /arch:SSE2 /fp:fast /Zc:wchar_t- /Fo"Release/" /W1 /nologo

Does someone has an idea what this could be related to?

Systems spec of test machine:

2 Xeon E5420 2.5 GHz
16 GB RAM
Windows XP 64
The application is 32 bit.

0 Kudos
1 Solution
joelkatz
Novice
439 Views
I had a somewhat similar issue. It turned out that the older compiler was making faster code because of a bug in it. That is, it was making optimizations that are technically not safe. (Though actually would be safe in every realistic situation.) The new compiler, by doing the technically right thing was producing inferior code.

In our case, adding a 'restrict' keyword to tell the compiler that the optimization was in fact safe solved the problem.

View solution in original post

0 Kudos
5 Replies
jelle
Beginner
439 Views
Additionally, I just found that no vectorization and/or loop unrolling takes place. With the previous compiler version, the code was heavily vectorization and unrolling. If I add -Qvec_report3 i get a ton of vector dependency issues, but no actual vectorization.

In fact, I compared the new performance to Visual C++ 2008, and obtain about the same level of peformance.
0 Kudos
TimP
Honored Contributor III
439 Views
With correct source code (which obeys the rules about typed aliasing), you should use /Qansi-alias. ICL implements restrict, in case that is applicable. Does /Oa allow the compiler to ignore dependencies?
Not much can be done without a specific example.
0 Kudos
jelle
Beginner
439 Views
I don't cast any pointer to another type, if that's wat you refer to by typed aliasing (as a self-taught programmer I'm not always up to par with the terminology). I do use pointer arithmetics since I found this to significantly increase performance. Maybe that's the cause of all troubles. However, I substituted the code with array indexing several times in the past, and never got the same performance, although this might seem hard to be believe. So changing it is not an option.

Both suggested flags don't result in an increased performance. It probably all comes down to the loop unrolling which does not take place, since I know from the previous version that disabling this really affects performance badly.

I can't share code due to confidentiality. So I might need to contact support for this, I assume.
0 Kudos
joelkatz
Novice
440 Views
I had a somewhat similar issue. It turned out that the older compiler was making faster code because of a bug in it. That is, it was making optimizations that are technically not safe. (Though actually would be safe in every realistic situation.) The new compiler, by doing the technically right thing was producing inferior code.

In our case, adding a 'restrict' keyword to tell the compiler that the optimization was in fact safe solved the problem.

0 Kudos
jelle
Beginner
439 Views

The irony of it all is that looking back at the code made it possible to reduce the time from 5.5 seconds to 4.0 seconds with 10.0.025, and from 9.2 seconds to 5.3 seconds with 11.1.51. So overall an increase in performance of about 30% for a piece of code which I had given up on optimizing! The decreased performance for the new compiler version remeans, even though vectorization does take place now thanks to the restrict keyword and some rewriting of the code.

Thanks for the tips about the aliasing issue and the restrict keyword. I'll clean the code further and take this back to Intel support.

0 Kudos
Reply