Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

my SSE code slower than SISD! Why?

orbano
Beginner
428 Views
Hello!
I would like to ask somebody to tell me, why is my SSE code for vector and matrix operations is slower. I must do something wrong, but dont know what. My vector addition for example:
data type:
typedef union
{
__m128 data;
float elements[4];
} vector4d;
with SSE (using intrinsics):
inline vector4dadd(vector4d a, vector4d b)
{
vector4d c;
c.data = _mm_add_ps(a.data, b.data);
return c;
};
without SSE:
inline vector4d add_sisd(vector4d a, vector4d b)
{
return set(a.elements[0]+b.elements[0],a.elements[1]+b.elements[1],a.elements[2]+b.elements[2],0);
};
(set() simply returnsa vector4d with theparameter values)
0 Kudos
2 Replies
TimP
Honored Contributor III
428 Views
Without more information, we're simply guessing. For example, you might be performing scalar operations on the components before or after. Remember also that the compiler would unroll at least by 8 in order to use mm_add_ps in pairs, when performing automatic vectorization. We don't even know which compiler you're using.
0 Kudos
orbano
Beginner
428 Views
No other operations are performed. i just put each function into a loop and measured the time the computation takes.
First i tried in debug mode, without any optimization. The sisd's time was the 50% of the simd version
With all optimizations on, it was 30% (3 times faster!!!!)
I dont know the compiler version. I have Visual C++ 6.0 with service pack 5 and Processor Pack (for SP5), with default compiler settings.
0 Kudos
Reply