- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
Hi,
I run the function dgemm used in MKL to do matrix-matrix multiplication into Visual studio 2008 in the release version. I compared the time it takes to run with the time taken by a UBlas function used to do matrix-matrix multiplication and the time taken by the code I wrote doing matrix multiplication elemenent by element.
The C++ code is as below:
vector
vector
vector
vector
vector
vector
boost::numeric::ublas::matrix
int i, j, k, m;
double tp;
clock_t t1Start, t1End, t2Start, t2End, t3Start, t3End, t4Start, t4End;
double tick_per_sec(CLOCKS_PER_SEC);
double t1, Dt1(0.0), t2, Dt2(0.0), t3, Dt3(0.0), t4, Dt4(0.0), totCGIter(0.0);
for (i = 0; i < NRow1; ++i)
for (j = 0; j < NCol1; ++j)
{
M1
G1[i*NRow1+j] = 1;
U1(i, j) = 1;
}
for (i = 0; i < NRow2; ++i)
for (j = 0; j < NCol2; ++j)
{
M2
G2[i*NRow2+j] = 2;
U2(i, j) = 2;
}
char transa1 = 'N';
char transa2 = 'N';
char transb1 = 'N';
char transb2 = 'N';
double alpha = 1.0;
double beta = 0.0;
t1Start = clock();
for(i=1; i
dgemm(&transa1, &transb1, &NRow1, &NCol2, &NCol1, α, &G1[0], &NRow1, &G2[0], &NCol1, β, &G3[0], &NRow1);
t1End = clock();
t1 = (t1End - t1Start) / tick_per_sec;
t2Start = clock();
for(m=1; m
for(i=0; i
for(j=0; j
{
tp=0.0;
for(k=0; k
tp += M1
M3
}
t2End = clock();
t2 = (t2End - t2Start) / tick_per_sec;
t3Start = clock();
for(m=1; m
U3 = prod(U1, U2);
t3End = clock();
t3 = (t3End - t3Start) / tick_per_sec;
The time taken by each function run is shown below for a 50 by 50 matrix. It is computed as the total time divided by the number of iterations:
iter 100, 200, 400, 600, 800, 1000, 1200, 1400
MKL 0.01766, 0.000, 0.01957, 0.0021, 0.033, 0.02964, 0.0188,0.01557
Manual 0.00219, 0.00257, 0.00215, 0.0021, 0.00205, 0.002, 0.0021, 0.0021
UBlas 0.00031, 0.00031, 0.000275, 0.000287, 0.000274, 0.00028, 0.00026, 0.000268
For UBlas and Manual the time is roughly stable. As I was expecting UBlas is much faster than Manual, 10 times; however, MKL is much slower.
Does anybody have any idea why is MKL slower?
Thank you.
Erasmo.
Lien copié
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
You don't take any evident precautions to be certain that the compiler treats redundant looping the same in each case.
If you are using MSVC, it does appear unlikely that your mixed stride written out dot products will be optimized.
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
After statically linking the MKL library I am able to get reasonable results. Indeed, before I was doing a dynamic library linking.
In Visual studio 2008 and in Release mode, at the moment I get at least a 30-foldincrease inspeed if I compare the time taken by the MKL function to run to a simple code where I do an element by element matrix multiplication. I tested matrices of sizes from 50 by 50 to 1000 by 1000. For a 50 by 50 I get a 32-fold increase, for a 1000 by 1000 the increase is 107-fold.
Do you have any comment to the above?
Thank you.
Erasmo.
- S'abonner au fil RSS
- Marquer le sujet comme nouveau
- Marquer le sujet comme lu
- Placer ce Sujet en tête de liste pour l'utilisateur actuel
- Marquer
- S'abonner
- Page imprimable