Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6981 Discussions

mkl_dcsrgemv doesn't yield any significant speedup

agnonchik
Beginner
274 Views
Hi,

Is it expected that mkl_dcsrgemv() doesn't outperform hand-coded routine compiled with VS8?
On my matrix, MKL yields 45.011 msecs, while hand-coded routne 46.04 msecs.

Another observation is that two-threads are only 10% faster than a single thread.

The matrix size is 2Mx2M, nnz=15M.

Thanks,
Agnonchik.
0 Kudos
2 Replies
Sergey_K_Intel1
Employee
274 Views
Quoting - agnonchik
Hi,

Is it expected that mkl_dcsrgemv() doesn't outperform hand-coded routine compiled with VS8?
On my matrix, MKL yields 45.011 msecs, while hand-coded routne 46.04 msecs.

Another observation is that two-threads are only 10% faster than a single thread.

The matrix size is 2Mx2M, nnz=15M.

Thanks,
Agnonchik.

Hi,

The routine you mentioned is aLevel 2 routine. The performance behavour described by you is typical for dense Level 2 BLAS and Level 2Sparse BLASroutines when the size of data exceeds the size of cache memory. For large data sizes performance of dense Level 2 BLAS as well as Sparse BLAS Level 2mainly depends onmemory bandwidth because of small amount of arithmetic operations. So changing algorithm for the usage of Level 3 can help to improve performance because Level 3 routines can reuse data in cache.

Unlike dense BLAS, performance of Sparse BLAS depends on matrix structure.

All the best
Sergey
0 Kudos
agnonchik
Beginner
274 Views
Thanks for your explanation!
Agnonchik.
0 Kudos
Reply