Sparse blas Matrix-vector vs simple implementation

tae · ‎08-27-2007

Hello,
I'm testing performance of mkl 9.1 on AMD Athlon 2200 processor.
I'm comparing the multiplication of sparse matrix to a vector.
When i compare

mkl_dcsrgemv

against
my simple function shown below

...
for (int i = 0; i < n; i++)
{
x = 0;

for (int j = ia; j < ia[i+1]; j++)
{
int col = ja[j-1] - 1;
x += v[col] * a[j-1];
}
}
...

I'm not getting any significant speed up after my code is compiled with optimization flag in gcc (-O2). I'm seeing only ~2-3% speedup.

Is it a normal behavior or should i be expecting a lot more speedup by using sparse-blas routines?

Thanks

Sergey_K_Intel1 · ‎09-18-2007

Hello

The performance of the routine mentioned by youdepends on the structure of the inputsparse matrix since the distribution of the nonzero elements in a sparse matrix determines the memory access patterns. So the performance greatly depends on input sparse matrix as well as on the its dimension.

Probably the numbers reported by you are normal. I need to look at the input data.

By the way the routine is OpenMP parallelized. Have you tested it in parallel mode by setting OMP_NUM_THREADS environment variable?

All the best

Sergey