<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re:cblas_sgemm performance bug with AVX512 in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cblas-sgemm-performance-bug-with-AVX512/m-p/1208649#M30060</link>
    <description>&lt;P&gt;Ok, I see, thanks.&lt;/P&gt;&lt;P&gt;I think, for such kind of tall and skin matrixes, no opportunities are using the wide (512bit) registers for vectorization. &lt;/P&gt;&lt;P&gt;When the nk is getting largen, then the performance of AVX-512 code branch is growing and will exceed the AVX2 code.&lt;/P&gt;&lt;BR /&gt;</description>
    <pubDate>Fri, 11 Sep 2020 11:36:33 GMT</pubDate>
    <dc:creator>Gennady_F_Intel</dc:creator>
    <dc:date>2020-09-11T11:36:33Z</dc:date>
    <item>
      <title>cblas_sgemm performance bug with AVX512</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cblas-sgemm-performance-bug-with-AVX512/m-p/1207658#M30032</link>
      <description>&lt;P&gt;&lt;SPAN class="value case-description" style="word-wrap: break-word;"&gt;Hello,&lt;BR /&gt;&lt;BR /&gt;I believe there is a performance bug in cblas_sgemm in MKL 2020 v2 and v3 on Intel AVX512 processors.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;LI-CODE lang="cpp"&gt;#include &amp;lt;cstddef&amp;gt;
#include "mkl.h"
int main() {
  size_t m = 250000;
  size_t nk = 6;
  float* data = new float[m * nk];
  float* other = new float[nk * nk];
  float* dest = new float[m * nk];
  cblas_sgemm(CblasColMajor, CblasTrans, CblasTrans, m, nk, nk, 1.0f, data, nk, other, nk, 0.0f, dest, m);
}&lt;/LI-CODE&gt;
&lt;P&gt;&lt;SPAN class="value case-description" style="word-wrap: break-word;"&gt;&lt;BR /&gt;Run on a Xeon Cascade Lake:&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;SPAN class="value case-description" style="word-wrap: break-word;"&gt;with MKL_ENABLE_INSTRUCTIONS=AVX2: 398 ops/sec&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN class="value case-description" style="word-wrap: break-word;"&gt;with MKL_ENABLE_INSTRUCTIONS=AVX512: 41 ops/sec - &lt;STRONG&gt;&lt;EM&gt;this should be &amp;gt;= AVX2&lt;/EM&gt;&lt;/STRONG&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN class="value case-description" style="word-wrap: break-word;"&gt;with default dispatching: 41 ops/sec&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN class="value case-description" style="word-wrap: break-word;"&gt;Run on an AMD EPYC Rome:&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;SPAN class="value case-description" style="word-wrap: break-word;"&gt;with default dispatching: 243 ops/sec&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN class="value case-description" style="word-wrap: break-word;"&gt;The defect only manifests for nk &amp;lt; 8.&lt;BR /&gt;&lt;BR /&gt;Thank you,&lt;BR /&gt;Zach&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 08 Sep 2020 17:48:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cblas-sgemm-performance-bug-with-AVX512/m-p/1207658#M30032</guid>
      <dc:creator>zbjornson</dc:creator>
      <dc:date>2020-09-08T17:48:18Z</dc:date>
    </item>
    <item>
      <title>Re:cblas_sgemm performance bug with AVX512</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cblas-sgemm-performance-bug-with-AVX512/m-p/1207842#M30034</link>
      <description>&lt;P&gt;Is that Linux OS?&lt;/P&gt;&lt;P&gt;Did you try the MKL 2020.0 version?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 09 Sep 2020 07:32:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cblas-sgemm-performance-bug-with-AVX512/m-p/1207842#M30034</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-09-09T07:32:33Z</dc:date>
    </item>
    <item>
      <title>Re: Re:cblas_sgemm performance bug with AVX512</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cblas-sgemm-performance-bug-with-AVX512/m-p/1208378#M30047</link>
      <description>&lt;P&gt;This is on Linux, yes.&lt;/P&gt;
&lt;P&gt;The same issue happens with 2020.0.&lt;/P&gt;
&lt;P&gt;Here's the full build line I'm using:&lt;/P&gt;
&lt;LI-CODE lang="none"&gt;g++ -I/opt/intel/mkl/include/ -DMKL_ILP64 -L/opt/intel/mkl/lib/intel64 -Wl,--no-as-needed -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lpthread -lm -ldl test.cpp -o test.o&lt;/LI-CODE&gt;
&lt;P&gt;Output:&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;$ MKL_VERBOSE=1 MKL_ENABLE_INSTRUCTIONS=AVX2 time ./test.o
MKL_VERBOSE Intel(R) MKL 2020.0 Product build 20191122 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 2.80GHz ilp64 sequential
MKL_VERBOSE SGEMM(T,T,250000,6,6,0x7fff9a6baba8,0x7ffb82e42010,6,0x556212ef8f20,6,0x7fff9a6babb0,0x7ffb82889010,250000) 6.34ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
0.10user 0.00system 0:00.10elapsed 100%CPU (0avgtext+0avgdata 16884maxresident)k
0inputs+0outputs (0major+3368minor)pagefaults 0swaps&lt;/LI-CODE&gt;&lt;LI-CODE lang="markup"&gt;$ MKL_VERBOSE=1 MKL_ENABLE_INSTRUCTIONS=AVX512 time ./test.o
MKL_VERBOSE Intel(R) MKL 2020.0 Product build 20191122 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.80GHz ilp64 sequential
MKL_VERBOSE SGEMM(T,T,250000,6,6,0x7ffd14db70e8,0x7f398b0ec010,6,0x560b61254f20,6,0x7ffd14db70f0,0x7f398ab33010,250000) 27.96ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:1
0.63user 0.00system 0:00.63elapsed 100%CPU (0avgtext+0avgdata 16960maxresident)k
0inputs+0outputs (0major+3368minor)pagefaults 0swaps&lt;/LI-CODE&gt;</description>
      <pubDate>Thu, 10 Sep 2020 17:29:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cblas-sgemm-performance-bug-with-AVX512/m-p/1208378#M30047</guid>
      <dc:creator>zbjornson</dc:creator>
      <dc:date>2020-09-10T17:29:16Z</dc:date>
    </item>
    <item>
      <title>Re:cblas_sgemm performance bug with AVX512</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cblas-sgemm-performance-bug-with-AVX512/m-p/1208649#M30060</link>
      <description>&lt;P&gt;Ok, I see, thanks.&lt;/P&gt;&lt;P&gt;I think, for such kind of tall and skin matrixes, no opportunities are using the wide (512bit) registers for vectorization. &lt;/P&gt;&lt;P&gt;When the nk is getting largen, then the performance of AVX-512 code branch is growing and will exceed the AVX2 code.&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 11 Sep 2020 11:36:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cblas-sgemm-performance-bug-with-AVX512/m-p/1208649#M30060</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-09-11T11:36:33Z</dc:date>
    </item>
    <item>
      <title>Re:cblas_sgemm performance bug with AVX512</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cblas-sgemm-performance-bug-with-AVX512/m-p/1212155#M30103</link>
      <description>&lt;P&gt;The issue is closing and we will no longer respond to this thread.&amp;nbsp;If you require additional assistance from Intel, please start a new thread.&amp;nbsp;Any further interaction in this thread will be considered community only.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 24 Sep 2020 10:08:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/cblas-sgemm-performance-bug-with-AVX512/m-p/1212155#M30103</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-09-24T10:08:27Z</dc:date>
    </item>
  </channel>
</rss>

