- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am currently evaluating whether it would be interesting for us to upgrade our license, and tested the performance of our application with the latest version of the Intel C++ compiler. To my suprise I notice a significant drop in performance for the most significant part of our code. Using the 10.0.25 compiler the run time is 5.5 seconds for a partical test data, and 9.2 seconds using the 11.1.51 compiler! For this test I disabled the use of multi CPUs, but our application is threaded by means of standard Windows threads. The same performance difference is present when using all 8 CPU cores of the machine.
The specific code uses basic arithmetics, some multiplications and additions of array elements, in 3 nested loops. So a lot of simple operations, array elements should be cached efficiently. Loop unrolling is set to automatic, and I use the following flags for both compilers:
/c /O2 /Ot /EHsc /MD /GS /arch:SSE2 /fp:fast /Zc:wchar_t- /Fo"Release/" /W1 /nologo
Does someone has an idea what this could be related to?
Systems spec of test machine:
2 Xeon E5420 2.5 GHz
16 GB RAM
Windows XP 64
The application is 32 bit.
The specific code uses basic arithmetics, some multiplications and additions of array elements, in 3 nested loops. So a lot of simple operations, array elements should be cached efficiently. Loop unrolling is set to automatic, and I use the following flags for both compilers:
/c /O2 /Ot /EHsc /MD /GS /arch:SSE2 /fp:fast /Zc:wchar_t- /Fo"Release/" /W1 /nologo
Does someone has an idea what this could be related to?
Systems spec of test machine:
2 Xeon E5420 2.5 GHz
16 GB RAM
Windows XP 64
The application is 32 bit.
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I had a somewhat similar issue. It turned out that the older compiler was making faster code because of a bug in it. That is, it was making optimizations that are technically not safe. (Though actually would be safe in every realistic situation.) The new compiler, by doing the technically right thing was producing inferior code.
In our case, adding a 'restrict' keyword to tell the compiler that the optimization was in fact safe solved the problem.
In our case, adding a 'restrict' keyword to tell the compiler that the optimization was in fact safe solved the problem.
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Additionally, I just found that no vectorization and/or loop unrolling takes place. With the previous compiler version, the code was heavily vectorization and unrolling. If I add -Qvec_report3 i get a ton of vector dependency issues, but no actual vectorization.
In fact, I compared the new performance to Visual C++ 2008, and obtain about the same level of peformance.
In fact, I compared the new performance to Visual C++ 2008, and obtain about the same level of peformance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With correct source code (which obeys the rules about typed aliasing), you should use /Qansi-alias. ICL implements restrict, in case that is applicable. Does /Oa allow the compiler to ignore dependencies?
Not much can be done without a specific example.
Not much can be done without a specific example.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't cast any pointer to another type, if that's wat you refer to by typed aliasing (as a self-taught programmer I'm not always up to par with the terminology). I do use pointer arithmetics since I found this to significantly increase performance. Maybe that's the cause of all troubles. However, I substituted the code with array indexing several times in the past, and never got the same performance, although this might seem hard to be believe. So changing it is not an option.
Both suggested flags don't result in an increased performance. It probably all comes down to the loop unrolling which does not take place, since I know from the previous version that disabling this really affects performance badly.
I can't share code due to confidentiality. So I might need to contact support for this, I assume.
Both suggested flags don't result in an increased performance. It probably all comes down to the loop unrolling which does not take place, since I know from the previous version that disabling this really affects performance badly.
I can't share code due to confidentiality. So I might need to contact support for this, I assume.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I had a somewhat similar issue. It turned out that the older compiler was making faster code because of a bug in it. That is, it was making optimizations that are technically not safe. (Though actually would be safe in every realistic situation.) The new compiler, by doing the technically right thing was producing inferior code.
In our case, adding a 'restrict' keyword to tell the compiler that the optimization was in fact safe solved the problem.
In our case, adding a 'restrict' keyword to tell the compiler that the optimization was in fact safe solved the problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The irony of it all is that looking back at the code made it possible to reduce the time from 5.5 seconds to 4.0 seconds with 10.0.025, and from 9.2 seconds to 5.3 seconds with 11.1.51. So overall an increase in performance of about 30% for a piece of code which I had given up on optimizing! The decreased performance for the new compiler version remeans, even though vectorization does take place now thanks to the restrict keyword and some rewriting of the code.
Thanks for the tips about the aliasing issue and the restrict keyword. I'll clean the code further and take this back to Intel support.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page