Community
cancel
Showing results for 
Search instead for 
Did you mean: 
karrde
Beginner
44 Views

Compiler bug?

I am using Intel C++ 11.1.038 [IA-32] in Studio 2008, WinXP SP3.

I provide a simple test program that hope will reproduce the issue.

1. The problem is to sum two arrays using SSE:

[cpp]struct Array
{
    Array(const int n) { Data = new float;  }

    ~Array() { delete [] Data; }

    float& operator[] (const int i){ return Data; }

    float * Data;
};


int main()
{
    int n = 16;
    Array x(n), a(n), b(n);

    for(int i = 0; i < n; ++i)
    {
        a = rand();
        b = rand();
    }

    for(int i = 0; i < n; i += 4)
    {
        _mm_storeu_ps(&x, _mm_add_ps(_mm_loadu_ps(&a), _mm_loadu_ps(&b)));
    }

    for(int i = 0; i < n; ++i)
    {
        printf("%g   %g \n", a + b, x);
    }

	return 0;
}[/cpp]
The compiler options are: /c /O2 /Og /Oi /Qipo /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /EHsc /MT /GS /Gy /arch:SSE2 /fp:fast /Fo"Release/" /W3 /nologo /Zi /QxSSE2

All is OK, the generated code for the loop is:
[cpp]004010B6  mov         eax,dword ptr  
004010B9  movups      xmm1,xmmword ptr [eax] 
004010BC  mov         edx,dword ptr  
004010BF  movu... 
004010CE  movups      xmm3,xmmword ptr [esi+10h] 
004010D2  mov         edi,dword ptr  
004010D5  ... 
004010E6  movups      xmm5,xmmword ptr [edx+20h] 
004010EA  mov         ecx,dword ptr  
004010ED  ... 
004010FE  movups      xmm7,xmmword ptr [edi+30h] 
00401102  mov         eax,dword ptr  
00401105  ...
2. Now, lets define a separate global function that wraps loadu_ps:

[bash]__m128 Load(float *p) { return _mm_loadu_ps(p); }[/bash]
and the loop becomes

[bash]_mm_storeu_ps(&x, _mm_add_ps(Load(&a), Load(&b)));[/bash]
The generated code for the loop is not only weird, but also wrong (the sum is not correct):

[bash]004010B6  mov         edi,dword ptr  
004010B9  mov         eax,dword ptr  
004010BC  mov         ecx,dword ptr  
004010BF  mov         edx,dword ptr [eax] 
004010C1  mov    ... 
004010DD  mov         edx,dword ptr [eax+10h] 
004010E0  mov         ecx,dword ptr  
004010E3  mov... 
00401101  mov         edx,dword ptr [eax+20h] 
00401104  mov         ecx,dword ptr  
00401107  mov... 
00401125  mov         edx,dword ptr [eax+30h] 
00401128  mov         ecx,dword ptr  
0040112B  mov...
3. Now if we move Load inside the struct

[bash]struct Array
{
    Array(const int n) { Data = new float;  }

    ~Array() { delete [] Data; }

    float& operator[] (const int i){ return Data; }

    __m128 Load(float *p) { return _mm_loadu_ps(p); }

    float * Data;
};[/bash]
and the loop becomes

[bash]_mm_storeu_ps(&x, _mm_add_ps(a.Load(&a), b.Load(&b)));[/bash]
all is OK again.

4. If we make the function static, then it is wrong again.
0 Kudos
2 Replies
Yang_W_Intel
Employee
44 Views

I will check the test case and let you know.
Thanks.
-Yang
Yang_W_Intel
Employee
44 Views

It's a compiler bug. I have reported the bug to Intel compiler team for fix.
You can workaround it by removing the option /Qipo
Thanks for reporting this issue.

Thanks.
-Yang
Reply