Community
cancel
Showing results for
Did you mean:
Beginner
655 Views

## MATRIX MULTIPLICATION USING SSE

can some one show me how to do a two 4x4 matrix multiplication using SSE instruction. codes should be compatible with c++ compiler
7 Replies
Black Belt
655 Views
Quoting - Smart Lubobya
can some one show me how to do a two 4x4 matrix multiplication using SSE instruction. codes should be compatible with c++ compiler
http://cache-www.intel.com/cd/00/00/29/37/293749_293749.pdf
includes such a description.
Another way would be to use the SSE transpose code you can find by the search button above, and then it may be possible to persuade the compiler to optimize the rest:
temp[][] = transpose(a[][])
for()
for()
c = inner_product(temp[...],b[...],0.f)

possibly by writing out the 4 inner_product instances each with declared aligned(16) vectors.
Beginner
655 Views
Quoting - tim18
http://cache-www.intel.com/cd/00/00/29/37/293749_293749.pdf
includes such a description.
Another way would be to use the SSE transpose code you can find by the search button above, and then it may be possible to persuade the compiler to optimize the rest:
temp[][] = transpose(a[][])
for()
for()
c = inner_product(temp[...],b[...],0.f)

possibly by writing out the 4 inner_product instances each with declared aligned(16) vectors.

Beginner
655 Views
Quoting - Smart Lubobya
can some one show me how to do a two 4x4 matrix multiplication using SSE instruction. codes should be compatible with c++ compiler

i have ready the pdf quoted but can not still understand how SSE are used. more light please
Beginner
655 Views
Quoting - Smart Lubobya

i have ready the pdf quoted but can not still understand how SSE are used. more light please

SSE instructions can be executed by using SIMD intrinsics or inline assembly.

This application note describes the multiplication of two matrices using Streaming SIMD Extensions:
AP-929 Streaming SIMD Extensions - Matrix Multiplication

In Section 4.3 you can find a ready-to-run example for 4x4 matrix multiplication.
Black Belt
655 Views
Quoting - hydroxyprolin

SSE instructions can be executed by using SIMD intrinsics or inline assembly.

Not to mention by writing standard Fortran, or nearly standard C or C++ code, and using one of the usual compilers. Schematically:

`[cpp]#include #include #include #include #include using namespace std;void mm44(float **a,float **b,float **c){    __declspec(align(16)) float at[4][4],bl[4][4];    for(int j=0; j<4; ++j)        for(int i=0; i<4; ++i){            at = a; //transpose 1st operand            bl = b; //local aligned non aliased            }    for (int j = 0; j < 4; ++j) //return matrix multiply 4x4      for (int i=0; i < 4; ++i)        c = inner_product(&at[0],&at[4],&bl[0],0.f);    }[/cpp]`
Unfortunately, the vector code generated by ICL is dead, as it has minimum count of 8.
Beginner
655 Views

There's some code at the bottom of this pdf for general matrix-matrix multiplication via inline intrinsics:

http://people.redhat.com/drepper/cpumemory.pdf

Great paper.
Beginner
655 Views

can someone advice the best compiler that i can buy for c++ and SSE use

send quotes to my email : cslubobya@yahoo.com