Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
6588 Discussions

Parallel algorithm used by mkl_?csrmultcsr

kadir
Beginner
94 Views

I am wondering which algorithm and parallelization strategy is utilized in mkl_?csrmultcsr? Does this function scale well on Intel Xeon Phi architecture?

0 Kudos
3 Replies
VipinKumar_E_Intel
94 Views

 

Csrmultcsr utilizes simple parallelization strategy (row-wise for the first matrix).

Scaling is heavily depend on the matrix structures and sizes, and normally, the scalability is bound by the memory subsystem bandwidth.

We would expect to see better performance on Xeon Phi vs. Xeon, but scalability would normally be quite modest as memory bandwidth is exhausted pretty quickly.

--Vipin

 

 

kadir
Beginner
94 Views

I suppose that a chunk of rows of the first matrix is assigned to a thread. Is it possible to set chunk size and OpenMP's scheduling policy?  Are there any other user-supplied parameters to reduce run time of mkl_?csrmultcsr on MIC? (Parameters other than OMP_NUM_THREADS and KMP_AFFINITY)

VipinKumar_E_Intel
94 Views

 

mkl_?csrmultcsr has the following parallelization strategy for non-transposed multiplication of two CSR matrices:first matrix is divided on chunks with more or less equal number of rows, and every chunk is assigned to a thread. Since the number of chunks is equal to number of threads, the chunk size can’t be set by the user outside MKL.

Could you please provide us with the use case of this function? Matrix sizes, sparsity pattern, input parameters, etc?

We can take a look at the testcase and then suggest possible steps in further tuning.

--Vipin

 

Reply