11.0.075 Win32 produces SSE2 instructions with -QxSSE
I'm compiling my application with -QxSSE -GL, since I have users that have non-SSE2 capable machines. I just got a minidump from such a user, and the compiler has issued a 'movsd xmm, mem' instruction. The subroutine deals only with floats, but does have some SSE intrinsics.
As far as I can tell, the code which causes the problems is:
mem and den are __m128, while _mem and _den are float *. The compiler cleverly restructures each line into a single movsd (for _mem, _mem) followed by xorps (for the 0, 0) and movlhps (to merge the two). Problem is movsd is a SSE2 command, which -QxSSE should have disabled. As far as I can see, this is the only SSE2 command used.
If I remove '-GL', the problem goes away, but so does some of the performance, and the users non non-SSE2 capable processors are the ones that need the optimizations the most.
Is there a workaround I can apply to tell the compiler that SSE is ok, but SSE2 really isn't, no matter how fancy it is?
Apologies if this is fixed in 11.1.048; I keep getting linker errors about symbol files with that release, so I've had to stay on 11.0 for now.
The same code compiles without any problems using 11.0.075.. Apart from the unwanted SSE2 code, that is.
I can't use 10.1 for this, as that gives me missing vtable symbols in declspec(dllimport)ed C++ classes.
Right now it looks like I'll have to split out my performance critical code into a DLL, without any external C++ classes, compile that with 10.1 -QxK, and compile the rest with 11.0 with -arch:ia32 .. That is more than a little bit messy though, and I'd really like to avoid it if possible. Compiling all the code with -arch:ia32 isn't an option, as I need the vectorized speedup of the performance critical parts to be able to run in realtime on the non-SSE2 processors.
This issue was reported before but got fixed. I verified the original testcase, it is indeed fixed.
so this maybe caused by a different scenario. Is it possible for you to send me more info or a testcase?
I haven't been able to create any minimal testcase for this; it happens when I link my application, but doesn't happen on smaller tests. I'll test some more and see if I can narrow it down a bit, and if so I'll post a followup here.