Sparse-Dense Matrix Multiplication

Rulli__Cosimo · ‎04-28-2020

Hi,

I am working with the mkl_sparse_s_mm routine to perform:

C = A * B

where A is a sparse matrix and C,B are dense matrices. I would like to have some details about the algorithm for sparse-dense matrix multiplication implemented by mkl_sparse_s_mm, i.e., if it uses cache aware strategies, specific micro-kernel implementations to fully leverage CPU registers etc..

Just to provide an example, high performance dense matrix multiplication (cblas_gemm) is usually implemented following the block pack algorithm.

Thank you

Cosimo Rullli

Gennady_F_Intel · ‎04-28-2020

it is the proprietary info, but we will ask sparse blas developers to look at this thread and they probably will share something, but I am not sure.

Kirill_V_Intel · ‎05-02-2020

Hello,

As Gennady said, it is a closed information. A general answer is yes, we try to do different (incl. hardware-aware) optimizations on different levels (multi-tthreading, vectorization and assembly) in order to achieve better performance, as we do in general in MKL.

You can find articles about different algorithms which can be used for sparse-dense matrix multiplication.

The sparsity of one of the inputs implies that some optimization techniques used for dense gemm make much less or no sense at all and at the same time, some additional ideas are used. It also helps to think whether the functionality is compute or memory bound.

Best,
Kirill