Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
6434 Discussions

mkl_sparse_s_mm slower for BSR format than for CSR

bozavlado
Beginner
689 Views

I am testing sparse matrix multiplication with BSR format and found that it is 3x slower than using CSR format (e.g. for matrices of shape 256x256 and sparse matrix with block size 4 and 4096 nonzero entries). I expected, that BSR format is faster than CSR (with the same amount of nonzero entries).

 

I am compiling code using (I tried icpx with same results):

`g++ -o sparse_bsr_simp sparse_bsr_simp.cpp -O3 -march=native -DMKL_LP64 -m64 -I/opt/intel/oneapi/mkl/2021.1.1//include  -Wl,--start-group /opt/intel/oneapi/mkl/2021.1.1//lib/intel64/libmkl_intel_lp64.a /opt/intel/oneapi/mkl/2021.1.1//lib/intel64/libmkl_sequential.a /opt/intel/oneapi
/mkl/2021.1.1//lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm -ldl`

And running via:

`./sparse_bsr_simp 256 256 4096 4`

With BSR format benchmark runs in 0.13s, with CSR format it run in 0.044s.
(this can be swapped by uncomenting correct convert function in the attached code).

 

What am I doing wrong?

 

 

 

Labels (1)
0 Kudos
7 Replies
Gennady_F_Intel
Moderator
676 Views

What is the CPU type?

bozavlado
Beginner
670 Views

Sorry, I forgot to include that and cannot include original post:

My CPU is: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz (this has AVX2)

Also same thing happens on: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz (also has AVX2)

And also on Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz (this has AVX512, but I have only 2020.1 MKL on that machine).

 

Are there any public benchmarks/guidelines for BSR matrix multiplication? Like what is good block_size, matrix sparsity to get even improvements over CSR?

 

 

Gennady_F_Intel
Moderator
660 Views

Thanks Vladimir, we will check.


Gennady_F_Intel
Moderator
655 Views

I see ~ similar numbers on my end :


$ icc -std=c++11 -mkl sparse_bsr_simp.cpp -o bsr.x

$ icc -std=c++11 -mkl sparse_csr_simp.cpp -o csr.x


$ echo $MKLROOT

/opt/intel/compilers_and_libraries_2020.4.304/linux/mkl


$ export KMP_AFFINITY=granularity=fine,compact,1,0

$ ./csr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.0131839 -5539.07

$ ./csr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.0132945 -5539.07

$ ./csr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.0133272 -5539.07


$ ./bsr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.0332802 -5539.07

$ ./bsr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.0327158 -5539.07

$ ./bsr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.0337939 -5539.07


Model name:      Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

We will check the problem and keep this thread informed.


-Gennady


Gennady_F_Intel
Moderator
649 Views

There is some perf gap when AVX-512 code branch has been choose:

$ ./csr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.00720631 -5539.07

$ ./csr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.00816685 -5539.07

$ ./csr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.00833476 -5539.07


$ ./bsr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.015087 -5539.07

$ ./bsr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.0148415 -5539.07

$ ./bsr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.0127416 -5539.07


CPU:   4 x Platinum 8286 2.9GHz



Gennady_F_Intel
Moderator
252 Views

Vladimir,

some improvements were done into MKL 2021.4 which is available for download.


Gennady_F_Intel
Moderator
185 Views

The thread is closing and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.


Reply