Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7060 Discussions

mkl_sparse_s_mm slower for BSR format than for CSR

bozavlado
Beginner
2,421 Views

I am testing sparse matrix multiplication with BSR format and found that it is 3x slower than using CSR format (e.g. for matrices of shape 256x256 and sparse matrix with block size 4 and 4096 nonzero entries). I expected, that BSR format is faster than CSR (with the same amount of nonzero entries).

 

I am compiling code using (I tried icpx with same results):

`g++ -o sparse_bsr_simp sparse_bsr_simp.cpp -O3 -march=native -DMKL_LP64 -m64 -I/opt/intel/oneapi/mkl/2021.1.1//include  -Wl,--start-group /opt/intel/oneapi/mkl/2021.1.1//lib/intel64/libmkl_intel_lp64.a /opt/intel/oneapi/mkl/2021.1.1//lib/intel64/libmkl_sequential.a /opt/intel/oneapi
/mkl/2021.1.1//lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm -ldl`

And running via:

`./sparse_bsr_simp 256 256 4096 4`

With BSR format benchmark runs in 0.13s, with CSR format it run in 0.044s.
(this can be swapped by uncomenting correct convert function in the attached code).

 

What am I doing wrong?

 

 

 

Labels (1)
0 Kudos
7 Replies
Gennady_F_Intel
Moderator
2,409 Views

What is the CPU type?

0 Kudos
bozavlado
Beginner
2,403 Views

Sorry, I forgot to include that and cannot include original post:

My CPU is: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz (this has AVX2)

Also same thing happens on: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz (also has AVX2)

And also on Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz (this has AVX512, but I have only 2020.1 MKL on that machine).

 

Are there any public benchmarks/guidelines for BSR matrix multiplication? Like what is good block_size, matrix sparsity to get even improvements over CSR?

 

 

0 Kudos
Gennady_F_Intel
Moderator
2,393 Views

Thanks Vladimir, we will check.


0 Kudos
Gennady_F_Intel
Moderator
2,388 Views

I see ~ similar numbers on my end :


$ icc -std=c++11 -mkl sparse_bsr_simp.cpp -o bsr.x

$ icc -std=c++11 -mkl sparse_csr_simp.cpp -o csr.x


$ echo $MKLROOT

/opt/intel/compilers_and_libraries_2020.4.304/linux/mkl


$ export KMP_AFFINITY=granularity=fine,compact,1,0

$ ./csr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.0131839 -5539.07

$ ./csr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.0132945 -5539.07

$ ./csr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.0133272 -5539.07


$ ./bsr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.0332802 -5539.07

$ ./bsr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.0327158 -5539.07

$ ./bsr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.0337939 -5539.07


Model name:      Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz

We will check the problem and keep this thread informed.


-Gennady


0 Kudos
Gennady_F_Intel
Moderator
2,382 Views

There is some perf gap when AVX-512 code branch has been choose:

$ ./csr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.00720631 -5539.07

$ ./csr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.00816685 -5539.07

$ ./csr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.00833476 -5539.07


$ ./bsr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.015087 -5539.07

$ ./bsr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.0148415 -5539.07

$ ./bsr.x 256 256 4096 4

blocksparse 4 256 256 4096 0.0127416 -5539.07


CPU:   4 x Platinum 8286 2.9GHz



0 Kudos
Gennady_F_Intel
Moderator
1,985 Views

Vladimir,

some improvements were done into MKL 2021.4 which is available for download.


0 Kudos
Gennady_F_Intel
Moderator
1,918 Views

The thread is closing and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.


0 Kudos
Reply