- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm compiling my application with -QxSSE -GL, since I have users that have non-SSE2 capable machines. I just got a minidump from such a user, and the compiler has issued a 'movsd xmm, mem' instruction. The subroutine deals only with floats, but does have some SSE intrinsics.
As far as I can tell, the code which causes the problems is:
mem[2] = _mm_setr_ps(_mem[8], _mem[9], 0, 0);
den[2] = _mm_setr_ps(_den[8], _den[9], 0, 0);
mem[] and den[] are __m128, while _mem and _den are float *.
The compiler cleverly restructures each line into a single movsd (for _mem[8], _mem[9]) followed by xorps (for the 0, 0) and movlhps (to merge the two). Problem is movsd is a SSE2 command, which -QxSSE should have disabled. As far as I can see, this is the only SSE2 command used.
If I remove '-GL', the problem goes away, but so does some of the performance, and the users non non-SSE2 capable processors are the ones that need the optimizations the most.
Is there a workaround I can apply to tell the compiler that SSE is ok, but SSE2 really isn't, no matter how fancy it is?
Apologies if this is fixed in 11.1.048; I keep getting linker errors about symbol files with that release, so I've had to stay on 11.0 for now.
I'm compiling my application with -QxSSE -GL, since I have users that have non-SSE2 capable machines. I just got a minidump from such a user, and the compiler has issued a 'movsd xmm, mem' instruction. The subroutine deals only with floats, but does have some SSE intrinsics.
As far as I can tell, the code which causes the problems is:
mem[2] = _mm_setr_ps(_mem[8], _mem[9], 0, 0);
den[2] = _mm_setr_ps(_den[8], _den[9], 0, 0);
mem[] and den[] are __m128, while _mem and _den are float *.
The compiler cleverly restructures each line into a single movsd (for _mem[8], _mem[9]) followed by xorps (for the 0, 0) and movlhps (to merge the two). Problem is movsd is a SSE2 command, which -QxSSE should have disabled. As far as I can see, this is the only SSE2 command used.
If I remove '-GL', the problem goes away, but so does some of the performance, and the users non non-SSE2 capable processors are the ones that need the optimizations the most.
Is there a workaround I can apply to tell the compiler that SSE is ok, but SSE2 really isn't, no matter how fancy it is?
Apologies if this is fixed in 11.1.048; I keep getting linker errors about symbol files with that release, so I've had to stay on 11.0 for now.
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Since ICL 11.0, the only option which doesn't generate SSE2 is /arch:ia32. 10.0 had a -QxK option for SSE, but it wasn't reliable in library support.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
Since ICL 11.0, the only option which doesn't generate SSE2 is /arch:ia32. 10.0 had a -QxK option for SSE, but it wasn't reliable in library support.
Tim is right. Please use /arch:ia32.
Apologies if this is fixed in 11.1.048; I keep getting linker errors about symbol files with that release, so I've had to stay on 11.0 for now.
do mean the .sbr file issue below? If so, it's being fixed as we speak.
BSCMAKE: error BK1506 : cannot open file 'C:Dev_build_intDSPTestRelDebugDspFilter.sbr': No such file or directory
Jennifer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Jennifer Jiang (Intel)
Tim is right. Please use /arch:ia32.
Apologies if this is fixed in 11.1.048; I keep getting linker errors about symbol files with that release, so I've had to stay on 11.0 for now.
do mean the .sbr file issue below? If so, it's being fixed as we speak.
BSCMAKE: error BK1506 : cannot open file 'C:Dev_build_intDSPTestRelDebugDspFilter.sbr': No such file or directory
Jennifer
No, in 11.1.048 I'm seeing
mumble_pch.obj : fatal error LNK1318: Unexpected PDB error; RPC (23) '(0x000006BA)'
The same code compiles without any problems using 11.0.075.. Apart from the unwanted SSE2 code, that is.
I can't use 10.1 for this, as that gives me missing vtable symbols in declspec(dllimport)ed C++ classes.
Right now it looks like I'll have to split out my performance critical code into a DLL, without any external C++ classes, compile that with 10.1 -QxK, and compile the rest with 11.0 with -arch:ia32 .. That is more than a little bit messy though, and I'd really like to avoid it if possible. Compiling all the code with -arch:ia32 isn't an option, as I need the vectorized speedup of the performance critical parts to be able to run in realtime on the non-SSE2 processors.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - thorvald.natvig
No, in 11.1.048 I'm seeing
mumble_pch.obj : fatal error LNK1318: Unexpected PDB error; RPC (23) '(0x000006BA)'
mumble_pch.obj : fatal error LNK1318: Unexpected PDB error; RPC (23) '(0x000006BA)'
This issue was reported before but got fixed. I verified the original testcase, it is indeed fixed.
so this maybe caused by a different scenario. Is it possible for you to send me more info or a testcase?
Thanks,
Jennifer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Jennifer Jiang (Intel)
This issue was reported before but got fixed. I verified the original testcase, it is indeed fixed.
so this maybe caused by a different scenario. Is it possible for you to send me more info or a testcase?
Thanks,
Jennifer
I haven't been able to create any minimal testcase for this; it happens when I link my application, but doesn't happen on smaller tests.
I'll test some more and see if I can narrow it down a bit, and if so I'll post a followup here.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page