- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i want to transpose a 4x4 matrix using sse2 intrinsics. How do i go about it?
x00 x01 x02 x03 //I0
x10 x11 x12 x13//I1
x20 x21 x22 x23 //I2
x30 x31 x32 x33//I3
x00 x01 x02 x03 //I0
x10 x11 x12 x13//I1
x20 x21 x22 x23 //I2
x30 x31 x32 x33//I3
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can use Macro from Visual Studio:
_MM_TRANSPOSE4_PS(row0, row1, row2, row3)
_MM_TRANSPOSE4_PS(row0, row1, row2, row3)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
tim18, the initial post on matrix transpose was for sse and is best suited for floats matrices. my current question is for integer matrix . my brief search sugests that for integers matrices we need to use punpcklo, punpckhi[sse2 intrinsics] combination to achieve better transpose. advise how i can use these sse2 intrinsics.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Cast your __m128i variables into __m128 variables (using _mm_castsi128_ps), use the macro _MM_TRANSPOSE_PS, then cast back using _mm_castps_si128.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The codes compiles but B2[4][4] output are fictitous numbers. where am i wrong?
#include "stdafx.h"#include "emmintrin.h"
#include
#include
using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
int B1[4][4]={1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};// matrix to be transposed
int B2[4][4];// transposed matrix
int n=0;
for (int i=0;i<4;i++)
for (int j=0;j<4;j++)
{
B1
}
__asm{
movq mm1, B1
movq mm2, B1+8
movq mm3, B1+12
movq mm4, B1+16
//step one
punpcklwd mm1, mm2
punpcklwd mm3, mm4
movq mm5, mm1// copy mm1 into mm5
punpckldq mm1, mm3
punpckhdq mm5, mm3
// Move result to B2
movq B2, mm1
movq B2+8, mm0
//step two
punpckhwd mm1, mm2
punpckhwd mm3, mm4
movq mm5, mm1// copy mm1 into mm5
punpckldq mm1, mm3
punpckhdq mm5, mm3
// move result to B2
movq B2+12, mm1
movq B2+16, mm0
emms
}
for(int i = 0; i<4; i++){
for(int j = 0; j<4; j++) cout << B2
cout << endl;
}
return 0;
}

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page