<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: mkl_sparse_s_mm slower for BSR format than for CSR in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/mkl-sparse-s-mm-slower-for-BSR-format-than-for-CSR/m-p/1237555#M30503</link>
    <description>&lt;P&gt;What is the CPU type?&lt;/P&gt;</description>
    <pubDate>Tue, 15 Dec 2020 18:45:08 GMT</pubDate>
    <dc:creator>Gennady_F_Intel</dc:creator>
    <dc:date>2020-12-15T18:45:08Z</dc:date>
    <item>
      <title>mkl_sparse_s_mm slower for BSR format than for CSR</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/mkl-sparse-s-mm-slower-for-BSR-format-than-for-CSR/m-p/1237472#M30502</link>
      <description>&lt;P&gt;I am testing sparse matrix multiplication with BSR format and found that it is 3x slower than using CSR format (e.g. for matrices of shape 256x256 and sparse matrix with block size 4 and 4096 nonzero entries).&amp;nbsp;I expected, that BSR format is faster than CSR (with the same amount of nonzero entries).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am compiling code using (I tried icpx with same results):&lt;/P&gt;
&lt;P&gt;`&lt;SPAN&gt;g++ -o sparse_bsr_simp sparse_bsr_simp.cpp -O3 -march=native -DMKL_LP64 -m64 -I/opt/intel/oneapi/mkl/2021.1.1//include &amp;nbsp;-Wl,--start-group /opt/intel/oneapi/mkl/2021.1.1//lib/intel64/libmkl_intel_lp64.a /opt/intel/oneapi/mkl/2021.1.1//lib/intel64/libmkl_sequential.a /opt/intel/oneapi&lt;BR /&gt;/mkl/2021.1.1//lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm -ldl`&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;And running via:&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;`./sparse_bsr_simp 256 256 4096 4`&lt;BR /&gt;&lt;BR /&gt;With BSR format benchmark runs in 0.13s, with CSR format it run in 0.044s.&lt;BR /&gt;(this can be swapped by uncomenting correct convert function in the attached code).&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;What am I doing wrong?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 15 Dec 2020 12:46:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/mkl-sparse-s-mm-slower-for-BSR-format-than-for-CSR/m-p/1237472#M30502</guid>
      <dc:creator>bozavlado</dc:creator>
      <dc:date>2020-12-15T12:46:30Z</dc:date>
    </item>
    <item>
      <title>Re: mkl_sparse_s_mm slower for BSR format than for CSR</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/mkl-sparse-s-mm-slower-for-BSR-format-than-for-CSR/m-p/1237555#M30503</link>
      <description>&lt;P&gt;What is the CPU type?&lt;/P&gt;</description>
      <pubDate>Tue, 15 Dec 2020 18:45:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/mkl-sparse-s-mm-slower-for-BSR-format-than-for-CSR/m-p/1237555#M30503</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-12-15T18:45:08Z</dc:date>
    </item>
    <item>
      <title>Re: mkl_sparse_s_mm slower for BSR format than for CSR</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/mkl-sparse-s-mm-slower-for-BSR-format-than-for-CSR/m-p/1237571#M30504</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Sorry, I forgot to include that and cannot include original post:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;My CPU is: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz (this has AVX2)&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Also same thing happens on:&amp;nbsp;Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz (also has AVX2)&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;And also on&amp;nbsp;&lt;SPAN&gt;Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz (this has AVX512, but I have only 2020.1 MKL on that machine).&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Are there any public benchmarks/guidelines for BSR matrix multiplication? Like what is good block_size, matrix sparsity to get even improvements over CSR?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 15 Dec 2020 19:23:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/mkl-sparse-s-mm-slower-for-BSR-format-than-for-CSR/m-p/1237571#M30504</guid>
      <dc:creator>bozavlado</dc:creator>
      <dc:date>2020-12-15T19:23:47Z</dc:date>
    </item>
    <item>
      <title>Re:mkl_sparse_s_mm slower for BSR format than for CSR</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/mkl-sparse-s-mm-slower-for-BSR-format-than-for-CSR/m-p/1237730#M30506</link>
      <description>&lt;P&gt;Thanks Vladimir, we will check.&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 16 Dec 2020 03:18:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/mkl-sparse-s-mm-slower-for-BSR-format-than-for-CSR/m-p/1237730#M30506</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-12-16T03:18:21Z</dc:date>
    </item>
    <item>
      <title>Re:mkl_sparse_s_mm slower for BSR format than for CSR</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/mkl-sparse-s-mm-slower-for-BSR-format-than-for-CSR/m-p/1237820#M30510</link>
      <description>&lt;P&gt;I see ~ similar numbers on my end :&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;$ icc -std=c++11 -mkl sparse_bsr_simp.cpp -o bsr.x&lt;/P&gt;&lt;P&gt;$ icc -std=c++11 -mkl sparse_csr_simp.cpp -o csr.x&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;$ echo $MKLROOT&lt;/P&gt;&lt;P&gt;/opt/intel/compilers_and_libraries_2020.4.304/linux/mkl&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;$ export KMP_AFFINITY=granularity=fine,compact,1,0&lt;/P&gt;&lt;P&gt;$ ./csr.x 256 256 4096 4&lt;/P&gt;&lt;P&gt;blocksparse 4 256 256 4096 &lt;B&gt;0.0131839 &lt;/B&gt;-5539.07&lt;/P&gt;&lt;P&gt;$ ./csr.x 256 256 4096 4&lt;/P&gt;&lt;P&gt;blocksparse 4 256 256 4096 &lt;B&gt;0.0132945 &lt;/B&gt;-5539.07&lt;/P&gt;&lt;P&gt;$ ./csr.x 256 256 4096 4&lt;/P&gt;&lt;P&gt;blocksparse 4 256 256 4096 &lt;B&gt;0.0133272 &lt;/B&gt;-5539.07&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;$ ./bsr.x 256 256 4096 4&lt;/P&gt;&lt;P&gt;blocksparse 4 256 256 4096 &lt;B&gt;0.0332802 &lt;/B&gt;-5539.07&lt;/P&gt;&lt;P&gt;$ ./bsr.x 256 256 4096 4&lt;/P&gt;&lt;P&gt;blocksparse 4 256 256 4096 &lt;B&gt;0.0327158 &lt;/B&gt;-5539.07&lt;/P&gt;&lt;P&gt;$ ./bsr.x 256 256 4096 4&lt;/P&gt;&lt;P&gt;blocksparse 4 256 256 4096 &lt;B&gt;0.0337939 &lt;/B&gt;-5539.07&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Model name:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz&lt;/P&gt;&lt;P&gt;We will check the problem and keep this thread informed.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;-Gennady&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 16 Dec 2020 08:07:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/mkl-sparse-s-mm-slower-for-BSR-format-than-for-CSR/m-p/1237820#M30510</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-12-16T08:07:33Z</dc:date>
    </item>
    <item>
      <title>Re:mkl_sparse_s_mm slower for BSR format than for CSR</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/mkl-sparse-s-mm-slower-for-BSR-format-than-for-CSR/m-p/1237868#M30511</link>
      <description>&lt;P&gt;There is some perf gap when AVX-512 code branch has been choose:&lt;/P&gt;&lt;P&gt;$ ./csr.x 256 256 4096 4&lt;/P&gt;&lt;P&gt;blocksparse 4 256 256 4096 &lt;B&gt;0.00720631&lt;/B&gt; -5539.07&lt;/P&gt;&lt;P&gt;$ ./csr.x 256 256 4096 4&lt;/P&gt;&lt;P&gt;blocksparse 4 256 256 4096 &lt;B&gt;0.00816685&lt;/B&gt; -5539.07&lt;/P&gt;&lt;P&gt;$ ./csr.x 256 256 4096 4&lt;/P&gt;&lt;P&gt;blocksparse 4 256 256 4096 &lt;B&gt;0.00833476&lt;/B&gt; -5539.07&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;$ ./bsr.x 256 256 4096 4&lt;/P&gt;&lt;P&gt;blocksparse 4 256 256 4096 &lt;B&gt;0.015087&lt;/B&gt; -5539.07&lt;/P&gt;&lt;P&gt;$ ./bsr.x 256 256 4096 4&lt;/P&gt;&lt;P&gt;blocksparse 4 256 256 4096 &lt;B&gt;0.0148415&lt;/B&gt; -5539.07&lt;/P&gt;&lt;P&gt;$ ./bsr.x 256 256 4096 4&lt;/P&gt;&lt;P&gt;blocksparse 4 256 256 4096 &lt;B&gt;0.0127416&lt;/B&gt; -5539.07&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;CPU:&amp;nbsp;&amp;nbsp;&amp;nbsp;4 x Platinum 8286 2.9GHz&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 16 Dec 2020 11:20:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/mkl-sparse-s-mm-slower-for-BSR-format-than-for-CSR/m-p/1237868#M30511</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-12-16T11:20:49Z</dc:date>
    </item>
    <item>
      <title>Re:mkl_sparse_s_mm slower for BSR format than for CSR</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/mkl-sparse-s-mm-slower-for-BSR-format-than-for-CSR/m-p/1319277#M32144</link>
      <description>&lt;P&gt;Vladimir,&lt;/P&gt;&lt;P&gt;some improvements were done into MKL 2021.4 which is available for download. &lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 05 Oct 2021 09:24:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/mkl-sparse-s-mm-slower-for-BSR-format-than-for-CSR/m-p/1319277#M32144</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2021-10-05T09:24:01Z</dc:date>
    </item>
    <item>
      <title>Re:mkl_sparse_s_mm slower for BSR format than for CSR</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/mkl-sparse-s-mm-slower-for-BSR-format-than-for-CSR/m-p/1320612#M32171</link>
      <description>&lt;P&gt;The thread is closing and we will no longer respond to this thread.&amp;nbsp;If you require additional assistance from Intel, please start a new thread.&amp;nbsp;Any further interaction in this thread will be considered community only.&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Sun, 10 Oct 2021 05:59:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/mkl-sparse-s-mm-slower-for-BSR-format-than-for-CSR/m-p/1320612#M32171</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2021-10-10T05:59:47Z</dc:date>
    </item>
  </channel>
</rss>

