Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Parallel algorithm used by mkl_?csrmultcsr

kadir
Beginner
211 Views

I am wondering which algorithm and parallelization strategy is utilized in mkl_?csrmultcsr? Does this function scale well on Intel Xeon Phi architecture?

0 Kudos
3 Replies
VipinKumar_E_Intel
211 Views

 

Csrmultcsr utilizes simple parallelization strategy (row-wise for the first matrix).

Scaling is heavily depend on the matrix structures and sizes, and normally, the scalability is bound by the memory subsystem bandwidth.

We would expect to see better performance on Xeon Phi vs. Xeon, but scalability would normally be quite modest as memory bandwidth is exhausted pretty quickly.

--Vipin

 

 

0 Kudos
kadir
Beginner
211 Views

I suppose that a chunk of rows of the first matrix is assigned to a thread. Is it possible to set chunk size and OpenMP's scheduling policy?  Are there any other user-supplied parameters to reduce run time of mkl_?csrmultcsr on MIC? (Parameters other than OMP_NUM_THREADS and KMP_AFFINITY)

0 Kudos
VipinKumar_E_Intel
211 Views

 

mkl_?csrmultcsr has the following parallelization strategy for non-transposed multiplication of two CSR matrices:first matrix is divided on chunks with more or less equal number of rows, and every chunk is assigned to a thread. Since the number of chunks is equal to number of threads, the chunk size can’t be set by the user outside MKL.

Could you please provide us with the use case of this function? Matrix sizes, sparsity pattern, input parameters, etc?

We can take a look at the testcase and then suggest possible steps in further tuning.

--Vipin

 

0 Kudos
Reply