- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I switched over from Visual Studio 2013 with Intel C++ 15 to Visual Studio 2015 with Intel C++ 17 (rev 4). Since the old compiler is still installed on my system, I can select to build with C++ 17 with "Base Platform Toolset" v120 in Visual Studio 2015.
What I'm seeing is that Visual Studio 2015 + C++ 17 + v140_xp gives me nearly exactly the same performance as Visual Studio 2013 + C++ 15 + v120_xp. Oddly however, if I replace v140_xp by v120_xp, the performance improves by 2-3%. I've repeated this test multiple times and each time I'm finding improvement numbers between 1.8 and 3.7%.
I can imagine that certain Windows calls could have gotten more or less expensive, but the code that's affected by this is calculation-heavy code. Now, I could just choose to compile with v120_xp, but that has several drawbacks. I upgraded to the new Visual Studio because of a bug fix in the std::chrono library, which of course isn't available in v120. And some of my projects have link errors when using v120_xp (which I can fix by linking to some v140 libraries, but that feels extremely scary).
2-3% isn't much, but it's a shame to loose performance for no good reason. So I was hoping that someone here could give some insight in how the Base Platform Toolset can affect the performance of calculations.
(Note: This is a project of 7 MB, so I really can't post an example file).
- Tags:
- CC++
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Optimization
- Parallel Computing
- Vectorization
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Interestingly, I found out that some of my algorithms are affected far more than others. I found one that's 10% slower with v140 than v120, and it's to also be about 10% slower than with the old compiler (with v120 for both they are the same). A lot of algorithms appear to be exactly the same on both. Now, oddly, there is nothing in this algorithm that should be using anything in the runtime.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Found it. Something changed in the definition of fabs() between v120 and v140, and the Intel compiler isn't generating a single AND anymore but a convert to double, then an and, and then a convert back to float.
Temporary solution, where you have to use 'habs' instead of 'fabs' (and it will give you an error if you use fabs anyway):
#if __INTEL_COMPILER == 1700 __forceinline float habs(const float f) { return (f >= 0) ? f : -f; } __forceinline double habs(const double f) { return (f >= 0) ? f : -f; } #define fabs error #else #define habs fabs #endif
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hm, I just tried fabsf instead of my habs and it's generating a function call everywhere I use it:
fstp DWORD PTR [92+esp] ;786.14[spill] ; fabsf(float) call _fabsf ;786.14
while my habs() generates this:
vandps xmm0, xmm0, XMMWORD PTR [_2il0floatpacket.50] ;786.14
(The ps here might seem to indicate that it's also vectorizing but that's not true).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are there more such changes? I seem to have one thing left that's making my new version slower, I kinda know where to look for it but any hint would help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
rand() got a lot slower. And I use it for dithering. I'll replace it by a lookup table.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page