- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am working with the mkl_sparse_s_mm routine to perform:
C = A * B
where A is a sparse matrix and C,B are dense matrices. I would like to have some details about the algorithm for sparse-dense matrix multiplication implemented by mkl_sparse_s_mm, i.e., if it uses cache aware strategies, specific micro-kernel implementations to fully leverage CPU registers etc..
Just to provide an example, high performance dense matrix multiplication (cblas_gemm) is usually implemented following the block pack algorithm.
Thank you
Cosimo Rullli
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
it is the proprietary info, but we will ask sparse blas developers to look at this thread and they probably will share something, but I am not sure.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
As Gennady said, it is a closed information. A general answer is yes, we try to do different (incl. hardware-aware) optimizations on different levels (multi-tthreading, vectorization and assembly) in order to achieve better performance, as we do in general in MKL.
You can find articles about different algorithms which can be used for sparse-dense matrix multiplication.
The sparsity of one of the inputs implies that some optimization techniques used for dense gemm make much less or no sense at all and at the same time, some additional ideas are used. It also helps to think whether the functionality is compute or memory bound.
Best,
Kirill

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page