##
##

given A 4X4 MATRIX X:

X[4][4]=X00 X01 X02 X04

X10 X11 X12 X13

X20 X21 X22 X23

X30 X31 X32 X33

IF ROW : X00 X01 X02 X04 is denoted by x0:

X10 X11 X12 X13 is denoted by x1

X20 X21 X22 X23 is denoted by x2

X30 X31 X32 X33 is denoted by x3

AND:

STAGE 1 STAGE 2

A0=X0+X3 Y0=A0+A1

A1=X1+X2 Y1=A2+A3<<1

A2=X1-X2 Y2=A0-A1

A3=X0-X3 Y3=A3-A2<<1

WHERE << DENOTES SHIFT LEFT.

HOW CAN I WRITE THE CODES using SSE/SSE2 intrinsics FOR THE ABOVE SENERIO.

Smart_Lubobya

Beginner

05-29-2010
07:09 AM

50 Views

##
##

According to your other posts, you generally seem to know how to use intrinsics. I am therefore unsure, what specific aspect you are struggling with.

You can implement the computation as follows: (You seem to work on 32bit integers.)

__int128i A0 = _mm_add_epi32(X0, X3);

__int128i A1 = _mm_add_epi32(X1, X2);

__int128i A2 = _mm_sub_epi32(X1, X2);

__int128i A3 = _mm_sub_epi32(X0, X3);

__int128i Y0 = _mm_add_epi32(A0, A1);

__int128i Y1 = _mm_add_epi32(A2, _mm_slli_epi32(A3, 1));

__int128i Y2 = _mm_sub_epi32(A0, A1);

__int128i Y3 = _mm_sub(A3, _mm_slli_epi32(A2, 1));

In case your results do not match your expectations, you can print the registers as discussed in this threador use a debugger that can display SSE registers.

For gaining an overview on what intrinsics are available, I strongly recommend the interactive "Intel Intrinsics Guide" which is available on this page.

Thomas_W_Intel

Employee

06-04-2010
05:10 AM

50 Views

##
##

Smart_Lubobya

Beginner

06-29-2010
07:18 AM

50 Views

is the __int128i A0ok or its suposed to be __m128i A0? See the error i get

error C2065: '__int128i' : undeclared identifier

##
##

gilgil

Beginner

06-29-2010
09:58 PM

50 Views

I think it shoud be __m128i.

##
##

thanks a lot. just one more question. how do code using sse2 intrinsics and show that

x00 x01 x02 x03 matrix elements are denoted by x0?

Smart_Lubobya

Beginner

07-06-2010
01:45 PM

50 Views

x00 x01 x02 x03 matrix elements are denoted by x0?

##
##

Brijender_B_Intel

Employee

07-06-2010
02:04 PM

50 Views

it depends on the data types of x00, x01 x02 x03 also.

You can use simple load instruction to load the data (SSE2).

_mm_load_si128(__m128 *data)or _mm_loadu_si128().

e.g. if they are char (8bit each). you can also use SSE4 instructions (PMOVZX), if data is packed:

_m128i x0 = _mm_cvtepu8_epi32(* (__m128i *) Input); where input is pointer to the integer (32bit containing 4 elements).

similarly if each element is short then you need to use: _mm_cvtepu16_epi32() and _mmcvtepu32_epi64 for 32 ints.

##
##

from

1 2 3 4 row x0

5 6 7 8

3 4 6 7

7 6 2 1 row x3

how do i load and add the elements of row x0 and row x3 using sse2 intrinsics?any example codes

Smart_Lubobya

Beginner

07-06-2010
03:21 PM

50 Views

1 2 3 4 row x0

5 6 7 8

3 4 6 7

7 6 2 1 row x3

how do i load and add the elements of row x0 and row x3 using sse2 intrinsics?any example codes

##
##

Thomas_W_Intel

Employee

07-07-2010
01:57 AM

50 Views

Instead of trying to figure everything out directly, you might want to work through some tutorial and ready-made examples first. For example, there is this tutorial or this article.

For more complete information about compiler optimizations, see our Optimization Notice.