Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7953 Discussions

MATRIX MULTIPLICATION USING SSE

Smart_Lubobya
Beginner
2,486 Views
can some one show me how to do a two 4x4 matrix multiplication using SSE instruction. codes should be compatible with c++ compiler
0 Kudos
7 Replies
TimP
Honored Contributor III
2,486 Views
Quoting - Smart Lubobya
can some one show me how to do a two 4x4 matrix multiplication using SSE instruction. codes should be compatible with c++ compiler
http://cache-www.intel.com/cd/00/00/29/37/293749_293749.pdf
includes such a description.
Another way would be to use the SSE transpose code you can find by the search button above, and then it may be possible to persuade the compiler to optimize the rest:
temp[][] = transpose(a[][])
for()
for()
c = inner_product(temp[...],b[...],0.f)

possibly by writing out the 4 inner_product instances each with declared aligned(16) vectors.
0 Kudos
Smart_Lubobya
Beginner
2,486 Views
Quoting - tim18
http://cache-www.intel.com/cd/00/00/29/37/293749_293749.pdf
includes such a description.
Another way would be to use the SSE transpose code you can find by the search button above, and then it may be possible to persuade the compiler to optimize the rest:
temp[][] = transpose(a[][])
for()
for()
c = inner_product(temp[...],b[...],0.f)

possibly by writing out the 4 inner_product instances each with declared aligned(16) vectors.

0 Kudos
Smart_Lubobya
Beginner
2,486 Views
Quoting - Smart Lubobya
can some one show me how to do a two 4x4 matrix multiplication using SSE instruction. codes should be compatible with c++ compiler

i have ready the pdf quoted but can not still understand how SSE are used. more light please
0 Kudos
hydroxyprolin
Beginner
2,486 Views
Quoting - Smart Lubobya

i have ready the pdf quoted but can not still understand how SSE are used. more light please

SSE instructions can be executed by using SIMD intrinsics or inline assembly.

This application note describes the multiplication of two matrices using Streaming SIMD Extensions:
AP-929 Streaming SIMD Extensions - Matrix Multiplication

In Section 4.3 you can find a ready-to-run example for 4x4 matrix multiplication.
0 Kudos
TimP
Honored Contributor III
2,486 Views
Quoting - hydroxyprolin

SSE instructions can be executed by using SIMD intrinsics or inline assembly.

Not to mention by writing standard Fortran, or nearly standard C or C++ code, and using one of the usual compilers. Schematically:

[cpp]#include 
#include
#include
#include
#include

using namespace std;

void mm44(float **a,float **b,float **c){
__declspec(align(16)) float at[4][4],bl[4][4];
for(int j=0; j<4; ++j)
for(int i=0; i<4; ++i){
at = a; //transpose 1st operand
bl = b; //local aligned non aliased
}
for (int j = 0; j < 4; ++j) //return matrix multiply 4x4
for (int i=0; i < 4; ++i)
c = inner_product(&at[0],&at[4],&bl[0],0.f);
}[/cpp]
Unfortunately, the vector code generated by ICL is dead, as it has minimum count of 8.
0 Kudos
jeff_keasler
Beginner
2,486 Views

There's some code at the bottom of this pdf for general matrix-matrix multiplication via inline intrinsics:

http://people.redhat.com/drepper/cpumemory.pdf

Great paper.
0 Kudos
Smart_Lubobya
Beginner
2,486 Views

can someone advice the best compiler that i can buy for c++ and SSE use

send quotes to my email : cslubobya@yahoo.com

0 Kudos
Reply