Visual Studio v120 vs v140 performance, Intel C++ 17

Hans_v_ · ‎06-13-2017

I switched over from Visual Studio 2013 with Intel C++ 15 to Visual Studio 2015 with Intel C++ 17 (rev 4). Since the old compiler is still installed on my system, I can select to build with C++ 17 with "Base Platform Toolset" v120 in Visual Studio 2015.

What I'm seeing is that Visual Studio 2015 + C++ 17 + v140_xp gives me nearly exactly the same performance as Visual Studio 2013 + C++ 15 + v120_xp. Oddly however, if I replace v140_xp by v120_xp, the performance improves by 2-3%. I've repeated this test multiple times and each time I'm finding improvement numbers between 1.8 and 3.7%.

I can imagine that certain Windows calls could have gotten more or less expensive, but the code that's affected by this is calculation-heavy code. Now, I could just choose to compile with v120_xp, but that has several drawbacks. I upgraded to the new Visual Studio because of a bug fix in the std::chrono library, which of course isn't available in v120. And some of my projects have link errors when using v120_xp (which I can fix by linking to some v140 libraries, but that feels extremely scary).

2-3% isn't much, but it's a shame to loose performance for no good reason. So I was hoping that someone here could give some insight in how the Base Platform Toolset can affect the performance of calculations.

(Note: This is a project of 7 MB, so I really can't post an example file).

Hans_v_ · ‎06-14-2017

Interestingly, I found out that some of my algorithms are affected far more than others. I found one that's 10% slower with v140 than v120, and it's to also be about 10% slower than with the old compiler (with v120 for both they are the same). A lot of algorithms appear to be exactly the same on both. Now, oddly, there is nothing in this algorithm that should be using anything in the runtime.

Hans_v_ · ‎06-14-2017

Found it. Something changed in the definition of fabs() between v120 and v140, and the Intel compiler isn't generating a single AND anymore but a convert to double, then an and, and then a convert back to float.

Temporary solution, where you have to use 'habs' instead of 'fabs' (and it will give you an error if you use fabs anyway):

#if __INTEL_COMPILER == 1700
__forceinline float habs(const float f)
{
    return (f >= 0) ? f : -f;
}
__forceinline double habs(const double f)
{
    return (f >= 0) ? f : -f;
}
#define fabs error
#else
#define habs fabs
#endif

TimP · ‎06-14-2017

If you don't intend cast to double, you should use fabsf(). Recent changes in c++ have taken away some type generic functionality. I haven't seen adequate discussion of this.

Hans_v_ · ‎06-14-2017

Hm, I just tried fabsf instead of my habs and it's generating a function call everywhere I use it:

        fstp      DWORD PTR [92+esp]                            ;786.14[spill]
;       fabsf(float)
        call      _fabsf                                        ;786.14

while my habs() generates this:

        vandps    xmm0, xmm0, XMMWORD PTR [_2il0floatpacket.50] ;786.14

(The ps here might seem to indicate that it's also vectorizing but that's not true).

Hans_v_ · ‎06-14-2017

Are there more such changes? I seem to have one thing left that's making my new version slower, I kinda know where to look for it but any hint would help.

Hans_v_ · ‎06-14-2017

rand() got a lot slower. And I use it for dithering. I'll replace it by a lookup table.