- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
can some one show me how to do a two 4x4 matrix multiplication using SSE instruction. codes should be compatible with c++ compiler
Link Copied
7 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Smart Lubobya
can some one show me how to do a two 4x4 matrix multiplication using SSE instruction. codes should be compatible with c++ compiler
includes such a description.
Another way would be to use the SSE transpose code you can find by the search button above, and then it may be possible to persuade the compiler to optimize the rest:
temp[][] = transpose(a[][])
for()
for()
c
possibly by writing out the 4 inner_product instances each with declared aligned(16) vectors.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
http://cache-www.intel.com/cd/00/00/29/37/293749_293749.pdf
includes such a description.
Another way would be to use the SSE transpose code you can find by the search button above, and then it may be possible to persuade the compiler to optimize the rest:
temp[][] = transpose(a[][])
for()
for()
c = inner_product(temp[...],b[...],0.f)
possibly by writing out the 4 inner_product instances each with declared aligned(16) vectors.
includes such a description.
Another way would be to use the SSE transpose code you can find by the search button above, and then it may be possible to persuade the compiler to optimize the rest:
temp[][] = transpose(a[][])
for()
for()
c
possibly by writing out the 4 inner_product instances each with declared aligned(16) vectors.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Smart Lubobya
can some one show me how to do a two 4x4 matrix multiplication using SSE instruction. codes should be compatible with c++ compiler
i have ready the pdf quoted but can not still understand how SSE are used. more light please
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Smart Lubobya
i have ready the pdf quoted but can not still understand how SSE are used. more light please
SSE instructions can be executed by using SIMD intrinsics or inline assembly.
This application note describes the multiplication of two matrices using Streaming SIMD Extensions:
AP-929 Streaming SIMD Extensions - Matrix Multiplication
In Section 4.3 you can find a ready-to-run example for 4x4 matrix multiplication.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - hydroxyprolin
SSE instructions can be executed by using SIMD intrinsics or inline assembly.
[cpp]#includeUnfortunately, the vector code generated by ICL is dead, as it has minimum count of 8.
#include
#include
#include
#include
using namespace std;
void mm44(float **a,float **b,float **c){
__declspec(align(16)) float at[4][4],bl[4][4];
for(int j=0; j<4; ++j)
for(int i=0; i<4; ++i){
at= a ; //transpose 1st operand
bl= b ; //local aligned non aliased
}
for (int j = 0; j < 4; ++j) //return matrix multiply 4x4
for (int i=0; i < 4; ++i)
c= inner_product(&at[0],&at[4],&bl [0],0.f);
}[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There's some code at the bottom of this pdf for general matrix-matrix multiplication via inline intrinsics:
http://people.redhat.com/drepper/cpumemory.pdf
Great paper.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
can someone advice the best compiler that i can buy for c++ and SSE use
send quotes to my email : cslubobya@yahoo.com
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page