- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I run the function dgemm used in MKL to do matrix-matrix multiplication into Visual studio 2008 in the release version. I compared the time it takes to run with the time taken by a UBlas function used to do matrix-matrix multiplication and the time taken by the code I wrote doing matrix multiplication elemenent by element.
The C++ code is as below:
vector
vector
vector
vector
vector
vector
boost::numeric::ublas::matrix
int i, j, k, m;
double tp;
clock_t t1Start, t1End, t2Start, t2End, t3Start, t3End, t4Start, t4End;
double tick_per_sec(CLOCKS_PER_SEC);
double t1, Dt1(0.0), t2, Dt2(0.0), t3, Dt3(0.0), t4, Dt4(0.0), totCGIter(0.0);
for (i = 0; i < NRow1; ++i)
for (j = 0; j < NCol1; ++j)
{
M1
G1[i*NRow1+j] = 1;
U1(i, j) = 1;
}
for (i = 0; i < NRow2; ++i)
for (j = 0; j < NCol2; ++j)
{
M2
G2[i*NRow2+j] = 2;
U2(i, j) = 2;
}
char transa1 = 'N';
char transa2 = 'N';
char transb1 = 'N';
char transb2 = 'N';
double alpha = 1.0;
double beta = 0.0;
t1Start = clock();
for(i=1; i
dgemm(&transa1, &transb1, &NRow1, &NCol2, &NCol1, α, &G1[0], &NRow1, &G2[0], &NCol1, β, &G3[0], &NRow1);
t1End = clock();
t1 = (t1End - t1Start) / tick_per_sec;
t2Start = clock();
for(m=1; m
for(i=0; i
for(j=0; j
{
tp=0.0;
for(k=0; k
tp += M1
M3
}
t2End = clock();
t2 = (t2End - t2Start) / tick_per_sec;
t3Start = clock();
for(m=1; m
U3 = prod(U1, U2);
t3End = clock();
t3 = (t3End - t3Start) / tick_per_sec;
The time taken by each function run is shown below for a 50 by 50 matrix. It is computed as the total time divided by the number of iterations:
iter 100, 200, 400, 600, 800, 1000, 1200, 1400
MKL 0.01766, 0.000, 0.01957, 0.0021, 0.033, 0.02964, 0.0188,0.01557
Manual 0.00219, 0.00257, 0.00215, 0.0021, 0.00205, 0.002, 0.0021, 0.0021
UBlas 0.00031, 0.00031, 0.000275, 0.000287, 0.000274, 0.00028, 0.00026, 0.000268
For UBlas and Manual the time is roughly stable. As I was expecting UBlas is much faster than Manual, 10 times; however, MKL is much slower.
Does anybody have any idea why is MKL slower?
Thank you.
Erasmo.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You don't take any evident precautions to be certain that the compiler treats redundant looping the same in each case.
If you are using MSVC, it does appear unlikely that your mixed stride written out dot products will be optimized.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After statically linking the MKL library I am able to get reasonable results. Indeed, before I was doing a dynamic library linking.
In Visual studio 2008 and in Release mode, at the moment I get at least a 30-foldincrease inspeed if I compare the time taken by the MKL function to run to a simple code where I do an element by element matrix multiplication. I tested matrices of sizes from 50 by 50 to 1000 by 1000. For a 50 by 50 I get a 32-fold increase, for a 1000 by 1000 the increase is 107-fold.
Do you have any comment to the above?
Thank you.
Erasmo.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page