topic mkl_dcsrgemv doesn't yield any significant speedup in Intel® oneAPI Math Kernel Library

mkl_dcsrgemv doesn't yield any significant speedup

agnonchik — Thu, 30 Jul 2009 13:46:33 GMT

Hi,

Is it expected that mkl_dcsrgemv() doesn't outperform hand-coded routine compiled with VS8?
On my matrix, MKL yields 45.011 msecs, while hand-coded routne 46.04 msecs.

Another observation is that two-threads are only 10% faster than a single thread.

The matrix size is 2Mx2M, nnz=15M.

Thanks,
Agnonchik.

Re: mkl_dcsrgemv doesn't yield any significant speedup

Sergey_K_Intel1 — Fri, 31 Jul 2009 05:00:06 GMT

Quoting - agnonchik

Hi,

The routine you mentioned is aLevel 2 routine. The performance behavour described by you is typical for dense Level 2 BLAS and Level 2Sparse BLASroutines when the size of data exceeds the size of cache memory. For large data sizes performance of dense Level 2 BLAS as well as Sparse BLAS Level 2mainly depends onmemory bandwidth because of small amount of arithmetic operations. So changing algorithm for the usage of Level 3 can help to improve performance because Level 3 routines can reuse data in cache.

Unlike dense BLAS, performance of Sparse BLAS depends on matrix structure.

All the best
Sergey

Re: mkl_dcsrgemv doesn't yield any significant speedup

agnonchik — Fri, 31 Jul 2009 09:05:14 GMT

Thanks for your explanation!
Agnonchik.