Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Smart_Lubobya
Beginner
655 Views

MATRIX MULTIPLICATION USING SSE

can some one show me how to do a two 4x4 matrix multiplication using SSE instruction. codes should be compatible with c++ compiler
0 Kudos
7 Replies
TimP
Black Belt
655 Views

Quoting - Smart Lubobya
can some one show me how to do a two 4x4 matrix multiplication using SSE instruction. codes should be compatible with c++ compiler
http://cache-www.intel.com/cd/00/00/29/37/293749_293749.pdf
includes such a description.
Another way would be to use the SSE transpose code you can find by the search button above, and then it may be possible to persuade the compiler to optimize the rest:
temp[][] = transpose(a[][])
for()
for()
c = inner_product(temp[...],b[...],0.f)

possibly by writing out the 4 inner_product instances each with declared aligned(16) vectors.
Smart_Lubobya
Beginner
655 Views

Quoting - tim18
http://cache-www.intel.com/cd/00/00/29/37/293749_293749.pdf
includes such a description.
Another way would be to use the SSE transpose code you can find by the search button above, and then it may be possible to persuade the compiler to optimize the rest:
temp[][] = transpose(a[][])
for()
for()
c = inner_product(temp[...],b[...],0.f)

possibly by writing out the 4 inner_product instances each with declared aligned(16) vectors.

Smart_Lubobya
Beginner
655 Views

Quoting - Smart Lubobya
can some one show me how to do a two 4x4 matrix multiplication using SSE instruction. codes should be compatible with c++ compiler

i have ready the pdf quoted but can not still understand how SSE are used. more light please
hydroxyprolin
Beginner
655 Views

Quoting - Smart Lubobya

i have ready the pdf quoted but can not still understand how SSE are used. more light please

SSE instructions can be executed by using SIMD intrinsics or inline assembly.

This application note describes the multiplication of two matrices using Streaming SIMD Extensions:
AP-929 Streaming SIMD Extensions - Matrix Multiplication

In Section 4.3 you can find a ready-to-run example for 4x4 matrix multiplication.
TimP
Black Belt
655 Views

Quoting - hydroxyprolin

SSE instructions can be executed by using SIMD intrinsics or inline assembly.

Not to mention by writing standard Fortran, or nearly standard C or C++ code, and using one of the usual compilers. Schematically:

[cpp]#include 
#include
#include
#include
#include

using namespace std;

void mm44(float **a,float **b,float **c){
__declspec(align(16)) float at[4][4],bl[4][4];
for(int j=0; j<4; ++j)
for(int i=0; i<4; ++i){
at = a; //transpose 1st operand
bl = b; //local aligned non aliased
}
for (int j = 0; j < 4; ++j) //return matrix multiply 4x4
for (int i=0; i < 4; ++i)
c = inner_product(&at[0],&at[4],&bl[0],0.f);
}[/cpp]
Unfortunately, the vector code generated by ICL is dead, as it has minimum count of 8.
jeff_keasler
Beginner
655 Views


There's some code at the bottom of this pdf for general matrix-matrix multiplication via inline intrinsics:

http://people.redhat.com/drepper/cpumemory.pdf

Great paper.
Smart_Lubobya
Beginner
655 Views

can someone advice the best compiler that i can buy for c++ and SSE use

send quotes to my email : cslubobya@yahoo.com

Reply