Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
Announcements
1079 Discussions

## code vectorisation Beginner
164 Views
how do i vectorise the first loop using sse2 intrinsics and achieve faster codes?
#include "stdafx.h"
#include
using namespace std;

int _tmain(int argc, _TCHAR* argv[])
{
short b={1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4};
short a;
short c;
for (int j =0;j<4;j++)
{
a=b+b;
a=b+b;
a=b-b;
a=b-b;
// step 2
c=a+a;
c=a+(a<<1);
c=a-a;
c=a-(a<<1);
}
for (int i = 0; i < 4; i++)
{
for (int j = 0; j < 4; j++)
cout << a << " ";
cout << endl;
}
cout< for (int i = 0; i < 4; i++)
{
for (int j = 0; j < 4; j++)
cout << c << " ";
cout << endl;
}
return 0;
} Employee
164 Views
it looks like you have asked same question twice. i will answer here.
1. Is loop count 4 (j<4) ? if so then there is no use of vectorizing this. To make it faster just unroll it w/o any loop. You may get little benefit. To unroll, You can write a simple "C" macro call it 4 times.

2. Assuming that you are having a big loop, then you may able to vectorize it. (Secondly, i dont know whether you need a needs any more or not, or they are just temp variable. assuming that you need them, your SSE code will look like this (you may need to fix little bit here and there):

__m128i temp0 = xmm0;

__m128i temp1 = xmm1;

__m128i temp2 = xmm2;

__m128i temp3 = xmm3;

xmm0= _mm_sub_epi16(xmm0, xmm3);
xmm1= _mm_sub_epi16(xmm1, xmm2);

__m128i temp4 = temp0;

_mm_store_si128(c, temp4);
temp0 = _mm_sub_epi16(temp0, temp1);
_mm_store_si128(c, temp0);

temp1 = xmm0;
temp4 = xmm1;

temp1 = _mm_slli_epi16(temp1, 1);
temp4 = _mm_slli_epi16(temp4, 1); 