- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
X[4][4]=X00 X01 X02 X04
X10 X11 X12 X13
X20 X21 X22 X23
X30 X31 X32 X33
IF ROW : X00 X01 X02 X04 is denoted by x0:
X10 X11 X12 X13 is denoted by x1
X20 X21 X22 X23 is denoted by x2
X30 X31 X32 X33 is denoted by x3
AND:
STAGE 1 STAGE 2
A0=X0+X3 Y0=A0+A1
A1=X1+X2 Y1=A2+A3<<1
A2=X1-X2 Y2=A0-A1
A3=X0-X3 Y3=A3-A2<<1
WHERE << DENOTES SHIFT LEFT.
HOW CAN I WRITE THE CODES using SSE/SSE2 intrinsics FOR THE ABOVE SENERIO.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can implement the computation as follows: (You seem to work on 32bit integers.)
__int128i A0 = _mm_add_epi32(X0, X3);
__int128i A1 = _mm_add_epi32(X1, X2);
__int128i A2 = _mm_sub_epi32(X1, X2);
__int128i A3 = _mm_sub_epi32(X0, X3);
__int128i Y0 = _mm_add_epi32(A0, A1);
__int128i Y1 = _mm_add_epi32(A2, _mm_slli_epi32(A3, 1));
__int128i Y2 = _mm_sub_epi32(A0, A1);
__int128i Y3 = _mm_sub(A3, _mm_slli_epi32(A2, 1));
In case your results do not match your expectations, you can print the registers as discussed in this threador use a debugger that can display SSE registers.
For gaining an overview on what intrinsics are available, I strongly recommend the interactive "Intel Intrinsics Guide" which is available on this page.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
is the __int128i A0ok or its suposed to be __m128i A0? See the error i get
error C2065: '__int128i' : undeclared identifier
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
x00 x01 x02 x03 matrix elements are denoted by x0?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your question is not clear. Are you asking how x0 ( a _m128i variable) will be loaded with 4 consecutive elements x00 x01 x02 x03?
it depends on the data types of x00, x01 x02 x03 also.
You can use simple load instruction to load the data (SSE2).
_mm_load_si128(__m128 *data)or _mm_loadu_si128().
e.g. if they are char (8bit each). you can also use SSE4 instructions (PMOVZX), if data is packed:
_m128i x0 = _mm_cvtepu8_epi32(* (__m128i *) Input); where input is pointer to the integer (32bit containing 4 elements).
similarly if each element is short then you need to use: _mm_cvtepu16_epi32() and _mmcvtepu32_epi64 for 32 ints.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1 2 3 4 row x0
5 6 7 8
3 4 6 7
7 6 2 1 row x3
how do i load and add the elements of row x0 and row x3 using sse2 intrinsics?any example codes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The answer to your question depends on how the values are stored in memory.
Instead of trying to figure everything out directly, you might want to work through some tutorial and ready-made examples first. For example, there is this tutorial or this article.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page