topic Sparse blas Matrix-vector vs simple implementation in Intel® oneAPI Math Kernel Library

Sparse blas Matrix-vector vs simple implementation

tae — Mon, 27 Aug 2007 19:10:10 GMT

Hello,
I'm testing performance of mkl 9.1 on AMD Athlon 2200 processor.
I'm comparing the multiplication of sparse matrix to a vector.
When i compare

mkl_dcsrgemv

against
my simple function shown below

...
for (int i = 0; i < n; i++)
{
x = 0;

for (int j = ia; j < ia[i+1]; j++)
{
int col = ja[j-1] - 1;
x += v[col] * a[j-1];
}
}
...

I'm not getting any significant speed up after my code is compiled with optimization flag in gcc (-O2). I'm seeing only ~2-3% speedup.

Is it a normal behavior or should i be expecting a lot more speedup by using sparse-blas routines?

Thanks

Re: Sparse blas Matrix-vector vs simple implementation

Sergey_K_Intel1 — Tue, 18 Sep 2007 09:33:04 GMT

Hello

The performance of the routine mentioned by youdepends on the structure of the inputsparse matrix since the distribution of the nonzero elements in a sparse matrix determines the memory access patterns. So the performance greatly depends on input sparse matrix as well as on the its dimension.

Probably the numbers reported by you are normal. I need to look at the input data.

By the way the routine is OpenMP parallelized. Have you tested it in parallel mode by setting OMP_NUM_THREADS environment variable?

All the best

Sergey