- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using Intel C++ 11.1.038 [IA-32] in Studio 2008, WinXP SP3.
I provide a simple test program that hope will reproduce the issue.
1. The problem is to sum two arrays using SSE:
All is OK, the generated code for the loop is:
4. If we make the function static, then it is wrong again.
I provide a simple test program that hope will reproduce the issue.
1. The problem is to sum two arrays using SSE:
[cpp]struct Array { Array(const int n) { Data = new floatThe compiler options are: /c /O2 /Og /Oi /Qipo /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /EHsc /MT /GS /Gy /arch:SSE2 /fp:fast /Fo"Release/" /W3 /nologo /Zi /QxSSE2; } ~Array() { delete [] Data; } float& operator[] (const int i){ return Data; } float * Data; }; int main() { int n = 16; Array x(n), a(n), b(n); for(int i = 0; i < n; ++i) { a = rand(); b = rand(); } for(int i = 0; i < n; i += 4) { _mm_storeu_ps(&x, _mm_add_ps(_mm_loadu_ps(&a), _mm_loadu_ps(&b))); } for(int i = 0; i < n; ++i) { printf("%g %g \n", a + b, x); } return 0; }[/cpp]
All is OK, the generated code for the loop is:
[cpp]004010B6 mov eax,dword ptr 004010B9 movups xmm1,xmmword ptr [eax] 004010BC mov edx,dword ptr 004010BF movups xmm0,xmmword ptr [edx] 004010C2 addps xmm1,xmm0 004010C5 mov ecx,dword ptr2. Now, lets define a separate global function that wraps loadu_ps:004010C8 movups xmmword ptr [ecx],xmm1 004010CB mov esi,dword ptr 004010CE movups xmm3,xmmword ptr [esi+10h] 004010D2 mov edi,dword ptr 004010D5 movups xmm2,xmmword ptr [edi+10h] 004010D9 addps xmm3,xmm2 004010DC mov eax,dword ptr004010DF movups xmmword ptr [eax+10h],xmm3 004010E3 mov edx,dword ptr 004010E6 movups xmm5,xmmword ptr [edx+20h] 004010EA mov ecx,dword ptr 004010ED movups xmm4,xmmword ptr [ecx+20h] 004010F1 addps xmm5,xmm4 004010F4 mov esi,dword ptr004010F7 movups xmmword ptr [esi+20h],xmm5 004010FB mov edi,dword ptr 004010FE movups xmm7,xmmword ptr [edi+30h] 00401102 mov eax,dword ptr 00401105 movups xmm6,xmmword ptr [eax+30h] 00401109 addps xmm7,xmm6 0040110C mov edx,dword ptr0040110F movups xmmword ptr [edx+30h],xmm7 [/cpp]
[bash]__m128 Load(float *p) { return _mm_loadu_ps(p); }[/bash]and the loop becomes
[bash]_mm_storeu_ps(&x, _mm_add_ps(Load(&a), Load(&b)));[/bash]The generated code for the loop is not only weird, but also wrong (the sum is not correct):
[bash]004010B6 mov edi,dword ptr3. Now if we move Load inside the struct004010B9 mov eax,dword ptr 004010BC mov ecx,dword ptr 004010BF mov edx,dword ptr [eax] 004010C1 mov dword ptr [ebp-78h],edx 004010C4 movups xmm1,xmmword ptr [ebp-78h] 004010C8 mov esi,dword ptr [ecx] 004010CA mov dword ptr [ebp-70h],esi 004010CD movups xmm0,xmmword ptr [ebp-70h] 004010D1 addps xmm1,xmm... 004010DD mov edx,dword ptr [eax+10h] 004010E0 mov ecx,dword ptr 004010E3 mov dword ptr [ebp-78h],edx 004010E6 movups xmm3,xmmword ptr [ebp-78h] 004010EA mov esi,dword ptr [ecx+10h] 004010ED mov dword ptr [ebp-70h],esi 004010F0 movups xmm2,xmmword ptr [ebp-70h] 004010F4 addps ... 00401101 mov edx,dword ptr [eax+20h] 00401104 mov ecx,dword ptr 00401107 mov dword ptr [ebp-78h],edx 0040110A movups xmm5,xmmword ptr [ebp-78h] 0040110E mov esi,dword ptr [ecx+20h] 00401111 mov dword ptr [ebp-70h],esi 00401114 movups xmm4,xmmword ptr [ebp-70h] 00401118 addps ... 00401125 mov edx,dword ptr [eax+30h] 00401128 mov ecx,dword ptr 0040112B mov dword ptr [ebp-78h],edx 0040112E movups xmm7,xmmword ptr [ebp-78h] 00401132 mov esi,dword ptr [ecx+30h] 00401135 mov dword ptr [ebp-70h],esi 00401138 movups xmm6,xmmword ptr [ebp-70h] 0040113C xor ...
[bash]struct Array { Array(const int n) { Data = new floatand the loop becomes; } ~Array() { delete [] Data; } float& operator[] (const int i){ return Data; } __m128 Load(float *p) { return _mm_loadu_ps(p); } float * Data; };[/bash]
[bash]_mm_storeu_ps(&x, _mm_add_ps(a.Load(&a), b.Load(&b)));[/bash]all is OK again.
4. If we make the function static, then it is wrong again.
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I will check the test case and let you know.
Thanks.
-Yang
Thanks.
-Yang
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's a compiler bug. I have reported the bug to Intel compiler team for fix.
You can workaround it by removing the option /Qipo
Thanks for reporting this issue.
Thanks.
-Yang
You can workaround it by removing the option /Qipo
Thanks for reporting this issue.
Thanks.
-Yang
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page