- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am testing sparse matrix multiplication with BSR format and found that it is 3x slower than using CSR format (e.g. for matrices of shape 256x256 and sparse matrix with block size 4 and 4096 nonzero entries). I expected, that BSR format is faster than CSR (with the same amount of nonzero entries).
I am compiling code using (I tried icpx with same results):
`g++ -o sparse_bsr_simp sparse_bsr_simp.cpp -O3 -march=native -DMKL_LP64 -m64 -I/opt/intel/oneapi/mkl/2021.1.1//include -Wl,--start-group /opt/intel/oneapi/mkl/2021.1.1//lib/intel64/libmkl_intel_lp64.a /opt/intel/oneapi/mkl/2021.1.1//lib/intel64/libmkl_sequential.a /opt/intel/oneapi
/mkl/2021.1.1//lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm -ldl`
And running via:
`./sparse_bsr_simp 256 256 4096 4`
With BSR format benchmark runs in 0.13s, with CSR format it run in 0.044s.
(this can be swapped by uncomenting correct convert function in the attached code).
What am I doing wrong?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is the CPU type?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, I forgot to include that and cannot include original post:
My CPU is: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz (this has AVX2)
Also same thing happens on: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz (also has AVX2)
And also on Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz (this has AVX512, but I have only 2020.1 MKL on that machine).
Are there any public benchmarks/guidelines for BSR matrix multiplication? Like what is good block_size, matrix sparsity to get even improvements over CSR?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Vladimir, we will check.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see ~ similar numbers on my end :
$ icc -std=c++11 -mkl sparse_bsr_simp.cpp -o bsr.x
$ icc -std=c++11 -mkl sparse_csr_simp.cpp -o csr.x
$ echo $MKLROOT
/opt/intel/compilers_and_libraries_2020.4.304/linux/mkl
$ export KMP_AFFINITY=granularity=fine,compact,1,0
$ ./csr.x 256 256 4096 4
blocksparse 4 256 256 4096 0.0131839 -5539.07
$ ./csr.x 256 256 4096 4
blocksparse 4 256 256 4096 0.0132945 -5539.07
$ ./csr.x 256 256 4096 4
blocksparse 4 256 256 4096 0.0133272 -5539.07
$ ./bsr.x 256 256 4096 4
blocksparse 4 256 256 4096 0.0332802 -5539.07
$ ./bsr.x 256 256 4096 4
blocksparse 4 256 256 4096 0.0327158 -5539.07
$ ./bsr.x 256 256 4096 4
blocksparse 4 256 256 4096 0.0337939 -5539.07
Model name: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
We will check the problem and keep this thread informed.
-Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is some perf gap when AVX-512 code branch has been choose:
$ ./csr.x 256 256 4096 4
blocksparse 4 256 256 4096 0.00720631 -5539.07
$ ./csr.x 256 256 4096 4
blocksparse 4 256 256 4096 0.00816685 -5539.07
$ ./csr.x 256 256 4096 4
blocksparse 4 256 256 4096 0.00833476 -5539.07
$ ./bsr.x 256 256 4096 4
blocksparse 4 256 256 4096 0.015087 -5539.07
$ ./bsr.x 256 256 4096 4
blocksparse 4 256 256 4096 0.0148415 -5539.07
$ ./bsr.x 256 256 4096 4
blocksparse 4 256 256 4096 0.0127416 -5539.07
CPU: 4 x Platinum 8286 2.9GHz
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vladimir,
some improvements were done into MKL 2021.4 which is available for download.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The thread is closing and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page