Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

## sse2 intrinsic equivalent Beginner
662 Views

how can i write these codes using sse2 intrinsic?

{

short* b =(short*)btr;

for(j = 0; j < 16; j += 4)

{

int f0 = (int)(b + b[j+3]);

int f3 = (int)(b - b[j+3]);

int f1 = (int)(b[j+1] + b[j+2]);

int f2 = (int)(b[j+1] - b[j+2]);

b = (short)(f0 + f1);

b[j+2] = (short)(f0 - f1);

b[j+1] = (short)(f2 + (f3 << 1));

b[j+3] = (short)(f3 - (f2 << 1));

}

}

7 Replies Employee
662 Views
it is same code that you posted in other thread. Just answered it. Beginner
662 Views
i tried your suggestion, but i am not sure i am storing the final computation well. i have access voilation errors on all _mm_store_si128(). kindly advice further.

__declspec(align(16))__m128i t0,t1,temp0,temp1,temp2,temp3,temp4,f0,f1,f2,f3;

__declspec(align(16)) __m128i*b = (__m128i*)btr;

f2 = _mm_unpackhi_epi64(t0,f0);//2,2,2,2,0,0,0,0

f3 = _mm_unpackhi_epi64(t1,f1);//4,4,4,4,0,0,0,0

temp0 = f0;

temp1 = f1;

temp2 = f2;

temp3 = f3;

f0 = _mm_sub_epi16(f0, f3);

f1 = _mm_sub_epi16(f1, f2);

temp4 = temp0;

_mm_store_si128(b, temp4);

temp0 = _mm_sub_epi16(temp0, temp1);

_mm_store_si128(b+2, temp0);

temp1 = f0;

temp4 = f1;

temp1 = _mm_slli_epi16(temp1, 1);

temp4 = _mm_slli_epi16(temp4, 1);

f1 = _mm_sub_epi16(f1, temp1);

_mm_store_si128(b+1, f0);

_mm_store_si128(b+3, f1); New Contributor II
662 Views
change to storeu, as b is very likely unaligned Beginner
662 Views
is'nt this__declspec(align(16)) __m128i*b = (__m128i*)btr; showing that b is aligned? New Contributor II
662 Views
b is a pointer, so the pointer address is aligned, but the actual memory is casted frombtr, that is not guarenteed to be aligned Beginner
662 Views
how then do i align btr? New Contributor II
662 Views
How is it allocated?
if it is static, use __declspec(align(...)) to align
if dynamic use _aligned_malloc, _aligned_free
if it is just an arbritart memory that you have no control over its alignment, use unaligned version of the function (replace the load/store with loadu/storeu) 