Community
cancel
Showing results for
Search instead for
Did you mean:
Highlighted
Beginner
70 Views

## sse2 intrinsic equivalent

how can i write these codes using sse2 intrinsic?
void edwin::add(void* btr)

{

short* b =(short*)btr;

for(j = 0; j < 16; j += 4)

{

int f0 = (int)(b + b[j+3]);

int f3 = (int)(b - b[j+3]);

int f1 = (int)(b[j+1] + b[j+2]);

int f2 = (int)(b[j+1] - b[j+2]);

b = (short)(f0 + f1);

b[j+2] = (short)(f0 - f1);

b[j+1] = (short)(f2 + (f3 << 1));

b[j+3] = (short)(f3 - (f2 << 1));

}

}

7 Replies
Highlighted
Employee
70 Views
it is same code that you posted in other thread. Just answered it.
Highlighted
Beginner
70 Views
i tried your suggestion, but i am not sure i am storing the final computation well. i have access voilation errors on all _mm_store_si128(). kindly advice further.

__declspec(align(16))__m128i t0,t1,temp0,temp1,temp2,temp3,temp4,f0,f1,f2,f3;

__declspec(align(16)) __m128i*b = (__m128i*)btr;

t0 = _mm_loadu_si128(b);//1,1,1,1,2,2,2,2

t1 = _mm_loadu_si128(b+1);//3,3,3,3,4,4,4,4

f0 = _mm_loadl_epi64(b);//1,1,1,1,0,0,0,0

f1 = _mm_loadl_epi64(b+1);//3,3,3,3,0,0,0,0

f2 = _mm_unpackhi_epi64(t0,f0);//2,2,2,2,0,0,0,0

f3 = _mm_unpackhi_epi64(t1,f1);//4,4,4,4,0,0,0,0

temp0 = f0;

temp1 = f1;

temp2 = f2;

temp3 = f3;

temp0 = _mm_add_epi16(temp0, f3);

temp1 = _mm_add_epi16(temp1, f2);

f0 = _mm_sub_epi16(f0, f3);

f1 = _mm_sub_epi16(f1, f2);

temp4 = temp0;

temp4 = _mm_add_epi16(temp4, temp1);

_mm_store_si128(b, temp4);

temp0 = _mm_sub_epi16(temp0, temp1);

_mm_store_si128(b+2, temp0);

temp1 = f0;

temp4 = f1;

temp1 = _mm_slli_epi16(temp1, 1);

temp4 = _mm_slli_epi16(temp4, 1);

f0 = _mm_add_epi16(f0, temp4);

f1 = _mm_sub_epi16(f1, temp1);

_mm_store_si128(b+1, f0);

_mm_store_si128(b+3, f1);

Highlighted
New Contributor II
70 Views
change to storeu, as b is very likely unaligned
Highlighted
Beginner
70 Views
is'nt this__declspec(align(16)) __m128i*b = (__m128i*)btr; showing that b is aligned?

Highlighted
New Contributor II
70 Views
b is a pointer, so the pointer address is aligned, but the actual memory is casted frombtr, that is not guarenteed to be aligned
Highlighted
Beginner
70 Views
how then do i align btr?
Highlighted
New Contributor II
70 Views
How is it allocated?
if it is static, use __declspec(align(...)) to align
if dynamic use _aligned_malloc, _aligned_free
if it is just an arbritart memory that you have no control over its alignment, use unaligned version of the function (replace the load/store with loadu/storeu)