Solved: Using sparse matrix formats for Sparse Matrix Dense Vector Product

Nikhil_T · ‎01-23-2021

I am trying to store a matrix in sparse format matrix vector product. In the documentation, I can only see sparse CSR format using set_crs_data function and using the gemv to compute the product.

Is there any other format supported by oneMKL to store sparse matrices and to compute the product of sparse matrix and dense vector product?

RahulV_intel · ‎02-02-2021

Hi @Nikhil_T ,

Could you please confirm if your query is resolved?

Thanks,

Rahul

View solution in original post

mecej4 · ‎01-23-2021

When it comes to performing operations on matrices and vectors with different representations, the number of combinations can be large, and making routines to perform such operations as general as possible can make the interfaces complicated and error prone.

The following code will calculate b = M v.

!compute b = M.v, where M is m X n, CSR-represented with ir(1:m+1), jc(1:nnz), val(1:nnz)
!                       v is n X 1,
!                       b is m X 1
do i = 1, m
   s = 0
   do k=ir(i),ir(i+1)-1
      s = s+val(k)*v(jc(k))
   end do
   b(i) = s
end do

Nikhil_T · ‎01-23-2021

@mecej4 I am dealing with a matrix of size one billion by one billion here. This method seems to perfrom poor. I was thinking to use one of the optimized functions of oneMKL.

jimdempseyatthecove · ‎01-24-2021

mecej4's, code would be just as efficient as would be anything contained within a library. It is conceivably no different from:

!compute b = M.v, where M is m X n, CSR-represented with ir(1:m+1), jc(1:nnz), val(1:nnz)
!                       v is n X 1,
!                       b is m X 1
do i = 1, m
   j = ir(i)
   k = ir(i+1)-1
   b(i) = dotproduct(val(j:k)*v(jc(j:k))
end do

At issue with the above code (assuming that is your issue) is the indexing of v(arraySliceOfIndexes).

Depending on the targeted instruction set, this may generate vector gather instructions, and the efficiency/inefficiency of the execution would depend on the vector-wide compactness (or lack there of) the indices. IOW how many cache lines are involved in gathering the 8 or 16 groupings of v(jc(j:k)).

Jim Dempsey

Kirill_V_Intel · ‎01-24-2021

Hello!

With all respect to @mecej4 and @jimdempseyatthecove, I believe they answered a different question than the one contained in the original post. There are many cases when writing an own version of sparse matrix - dense vector product is not the most efficient approach and decreases performance of the overall application significantly.

oneMKL offers several ways to compute sparse matrix - dense vector product using optimized routines:

1) DPC++ interfaces: https://docs.oneapi.com/versions/latest/onemkl/mkl-sparse-gemv.html
Currently, only CSR format is supported.

2) C/Fortran interfaces with Inspector-Executor API (IE Sparse BLAS):
https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-fortran/top/blas-and-sparse-blas-routines/inspector-executor-sparse-blas-routines.html
This API has been developed to enhance the NIST-like Sparse BLAS API (see below) with a higher-level approach.
API for spmv, https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-fortran/top/blas-and-sparse-blas-routines/inspector-executor-sparse-blas-routines/inspector-executor-sparse-blas-execution-routines/mkl-sparse-mv.html supports through an opaque handle the formats: CSR, CSC, COO, BSR.
Plus, additionally, by calling the mkl_sparse_optimize() routine with approriate hints the user can enable several internal formats (which are stored as optimized data inside the matrix handle) which are tuned for performance and work very well for appropriate sparsity patterns.

3) C/Fortran interfaces with NIST-like Sparse BLAS API: https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-fortran/top/blas-and-sparse-blas-routines/sparse-blas-level-2-and-level-3-routines/sparse-blas-level-2-and-level-3-routines-1.html
These are also present in oneMKL as oneMKL has everything what MKL had (plus, additionally, new DPC++ stuff).
As you might see, it is low-level API with different APIs for different formats: CSR, CSC, COO, BSR, DIA.
This API is declared deprecated but is still supported. But I recomment using IE Sparse BLAS API described above.

4) C interface for GraphBLAS-like API (experimental):
https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/graph-routines/graph-functionality/graph-operations/mkl-graph-mxv.html

Provides additional flexibility (if you're familiar with GraphBLAS project) to the performed operation (allows semirings and masks, for example), integer/boolean valus and sparse input/output vectors.
Supports CSR and CSC formats.

I hope this helps.

Best,
Kirill

Nikhil_T · ‎01-25-2021

Thanks for the reply @Kirill_V_Intel. The second solution seems to work for me. However I do have other doubts having read the implementations of the same:

1) Are these libraries accessible in the DPC++?

2) If yes, do these function and other function of oneMKL work in parallel mode? If they can be, what is the procedure for executing these function (including the ones mentioned in the 2 point of your answer) in parallel mode for VS2019?

3) Is there a way to have repeated entries of col and row data but different values (so that the compiler automatically sums up the values that have the same col and row values; much like it is done in scipy). Or does it need to be implemented by the programmer?

4) Parts of my algorithm requires to a have the diagonal entries of CSR matrix as a separate matrix and the strictly upper and strictly lower parts as another separate matrix. Does oneMKL supports any such feature that can split one CSR matrix into diagonal and the rest as two matrices? Or any method that can give one of them as a separate matrix?

Kirill_V_Intel · ‎01-25-2021

@Nikhil_T,

1) Yes. oneMKL is in this sense an extension of MKL, with an addition of the new DPC++ interfaces (which can work on GPU) for selected functionality. So you can use any of the functionality which was available with C/Fortran interfaces of MKL. Of course, MKL only supported CPUs so if you want to execute your code on GPU using C/Fortran interfaces, you need to use OpenMP offload.

2) If you want to use the C/Fortran interfaces ...
a) ... on CPU, basically, you will just call a C function from MKL in your DPC++ code. If you link your application with a threaded version of MKL libraries, parallelization will happen inside MKL and you don't need to do anything specific. That's the way MKL works.

b) ... on GPU, you will need to use special OpenMP offload #pragmas.

3) Such duplicates accumulation will not be done by compiler, it is something to be done in runtime. It is possible but currently is not supported in sparse functionality of MKL/oneMKL.

4) Currently, MKL cannot do any sparse operation on such matrix decomposition via a single call. Neither it can create such a splitting. It is a bit too custom an operation and also difficult to optimize in a general setting.

I hope other experts here can add more to my answer and provide more links to various doc pages and KB articles about how to properly use oneMKL functionality.

Best,
Kirill

Nikhil_T · ‎01-26-2021

Thanks for reply. for the point no. 2, how do I link my application with the threaded version of mkl library. Also to mention, I am a complete beginner in this field and dont have much experience with CLI interface based compilation. Could you please tell how to do this considering that I am a starter in this domain?

Thank you very much!

Nikhil_T · ‎01-26-2021

@Kirill_V_Intel I have another doubt as well. The mkl_sparse_spmm accepts the handles for the matrices. Does it accepts the handle that has been set by init_matrix_handle of Sparse BLAS routine or it only accepts the handles set up by sparse_matrix_t??

Kirill_V_Intel · ‎01-28-2021

Hi again,

1) About the handles, the answer is no. Better think of C/Fortran interfaces and DPC++ interfaces as separate sets of API. You cannot currently pass a matrix handle from IE SpBLAS into DPC++ routines like the one created by init_matrix_handle.

The only way you can exchange information between handles created by different APIs is through explicitly exporting data (e.g., raw CSR pointers) from one of them and putting the data into the (newly created) handle of the other type.

2) As for linking MKL: I think you've opened another question [I'd recommend to do that anyway], so folks there will help you with that.

Best,
Kirill

RahulV_intel · ‎02-02-2021

Hi @Nikhil_T ,

Could you please confirm if your query is resolved?

Thanks,

Rahul

RahulV_intel · ‎02-02-2021

Hi,

Thanks for the confirmation. Intel will no longer monitor this thread. Further discussions on this thread will be considered community only.

Regards,

Rahul