<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Thanks Ruqiu. I have tested in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-is-slower-than-AVX2-when-running-CGESDD-SGESDD-on-Xeon/m-p/1129486#M25502</link>
    <description>&lt;P&gt;Thanks Ruqiu. I have tested MKL 2019.0 as well and it has the same problem.&lt;/P&gt;&lt;P&gt;Best,&lt;/P&gt;&lt;P&gt;Jian&lt;/P&gt;</description>
    <pubDate>Tue, 25 Feb 2020 08:37:43 GMT</pubDate>
    <dc:creator>Ding__Jian</dc:creator>
    <dc:date>2020-02-25T08:37:43Z</dc:date>
    <item>
      <title>AVX512 is slower than AVX2 when running CGESDD/SGESDD on Xeon Gold 6130</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-is-slower-than-AVX2-when-running-CGESDD-SGESDD-on-Xeon/m-p/1129484#M25500</link>
      <description>&lt;P&gt;I am evaluating the performance of Intel MKL&amp;nbsp;on Xeon Gold 6130 processors, which have&amp;nbsp;two AVX512 FMA units. I see performance improvement with AVX512 for matrix multiplication and FFT. However, for matrix inversion, the performance of AVX512 is worse than AVX2. I tested complex float (CGESDD) and float (SGESDD).&amp;nbsp;&lt;/P&gt;&lt;P&gt;My question is: what is the reason that cause the slowdown of AVX512 for&amp;nbsp;CGESDD/SGESDD?&amp;nbsp;Is it because these functions are not optimized for AVX512 or something I did wrong?&lt;/P&gt;&lt;P&gt;Below is the output when MKL_VERBOSE is enabled&lt;/P&gt;
&lt;PRE class="brush:; class-name:dark;"&gt;MKL_VERBOSE Intel(R) MKL 2020.0 Product build 20191122 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.10GHz lp64 sequential
&lt;/PRE&gt;

&lt;P&gt;I set MKL_ENABLE_INSTRUCTIONS to be AVX2 or AVX512 to compare their performance and set the library to be sequential.&lt;/P&gt;
&lt;P&gt;-----------------------------------------------------------------&lt;/P&gt;
&lt;P&gt;For SGESDD/CGESDD, AVX2 outperforms AVX512 in most cases&lt;/P&gt;
&lt;P&gt;64x64 matrix:&lt;/P&gt;
&lt;UL&gt;&lt;LI&gt;SGESDD: AVX2: 536.91us&amp;nbsp;AVX512:&amp;nbsp;703.39us&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/LI&gt;&lt;LI&gt;CGESDD: AVX2: 766.52us&amp;nbsp;AVX512: 861.09us&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;1000x1000 matrix:&lt;/P&gt;
&lt;UL&gt;&lt;LI&gt;SGESDD: AVX2: 305.60ms&amp;nbsp;AVX512:&amp;nbsp;360.65ms &amp;nbsp;&lt;/LI&gt;&lt;LI&gt;CGESDD: AVX2: 744.38ms&amp;nbsp;AVX512:&amp;nbsp;696.96ms (AVX512 is slightly better)&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;-----------------------------------------------------------------&lt;/P&gt;
&lt;P&gt;For SGEMM/CGEMM, AVX512&amp;nbsp;outperforms AVX2&lt;/P&gt;
&lt;P&gt;64x64 matrix:&lt;/P&gt;
&lt;UL&gt;&lt;LI&gt;&lt;P&gt;SGEMM: AVX2: 8.58us&amp;nbsp;AVX512:&amp;nbsp;7.08us&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;CGEMM: AVX2: 43.55us&amp;nbsp;AVX512:&amp;nbsp;23.06us&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;1000x1000 matrix:&lt;/P&gt;
&lt;UL&gt;&lt;LI&gt;&lt;P&gt;SGEMM: AVX2: 27.98ms&amp;nbsp;AVX512:&amp;nbsp;18.40ms&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;CGEMM: AVX2: 109.17ms&amp;nbsp;AVX512:&amp;nbsp;69.49ms&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;-----------------------------------------------------------------&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 21 Feb 2020 23:41:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-is-slower-than-AVX2-when-running-CGESDD-SGESDD-on-Xeon/m-p/1129484#M25500</guid>
      <dc:creator>Ding__Jian</dc:creator>
      <dc:date>2020-02-21T23:41:39Z</dc:date>
    </item>
    <item>
      <title>Hello Ding, Jian,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-is-slower-than-AVX2-when-running-CGESDD-SGESDD-on-Xeon/m-p/1129485#M25501</link>
      <description>&lt;P&gt;Hello Ding, Jian,&lt;/P&gt;&lt;P&gt;Thank you for raising the topic! We will investigate the problem and back to here once there is any update.&lt;/P&gt;&lt;P&gt;One quick question is based on your test, the performance issue is exist in MKL 2020.0 or other version also has the same problem?&lt;/P&gt;&lt;P&gt;Best Regards，&lt;/P&gt;&lt;P&gt;Ruqiu&lt;/P&gt;</description>
      <pubDate>Tue, 25 Feb 2020 08:24:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-is-slower-than-AVX2-when-running-CGESDD-SGESDD-on-Xeon/m-p/1129485#M25501</guid>
      <dc:creator>Ruqiu_C_Intel</dc:creator>
      <dc:date>2020-02-25T08:24:27Z</dc:date>
    </item>
    <item>
      <title>Thanks Ruqiu. I have tested</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-is-slower-than-AVX2-when-running-CGESDD-SGESDD-on-Xeon/m-p/1129486#M25502</link>
      <description>&lt;P&gt;Thanks Ruqiu. I have tested MKL 2019.0 as well and it has the same problem.&lt;/P&gt;&lt;P&gt;Best,&lt;/P&gt;&lt;P&gt;Jian&lt;/P&gt;</description>
      <pubDate>Tue, 25 Feb 2020 08:37:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-is-slower-than-AVX2-when-running-CGESDD-SGESDD-on-Xeon/m-p/1129486#M25502</guid>
      <dc:creator>Ding__Jian</dc:creator>
      <dc:date>2020-02-25T08:37:43Z</dc:date>
    </item>
    <item>
      <title>Jian, yes, we see the same</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-is-slower-than-AVX2-when-running-CGESDD-SGESDD-on-Xeon/m-p/1129487#M25503</link>
      <description>&lt;P&gt;Jian, yes, we see the same behavior with version 2020 and it looks like the&amp;nbsp;code branch&amp;nbsp;for this particular function is not well optimized. In the case, if this regression is important to you, I recommend submitting the problem via the official support channel -&lt;A href="https://supporttickets.intel.com/"&gt; intel online service center&lt;/A&gt; against MKL product.&lt;/P&gt;</description>
      <pubDate>Thu, 05 Mar 2020 05:00:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-is-slower-than-AVX2-when-running-CGESDD-SGESDD-on-Xeon/m-p/1129487#M25503</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-03-05T05:00:39Z</dc:date>
    </item>
    <item>
      <title>Jian, </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-is-slower-than-AVX2-when-running-CGESDD-SGESDD-on-Xeon/m-p/1129488#M25504</link>
      <description>&lt;P&gt;Jian,&amp;nbsp;&lt;/P&gt;&lt;P&gt;The time spent in xGESDD highly depends on the distribution of singular values.&amp;nbsp; Could you recheck the results by using exactly the same input matrix for calling ?GESDD on AVX2 and AVX512?&lt;/P&gt;</description>
      <pubDate>Sat, 04 Apr 2020 06:07:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-is-slower-than-AVX2-when-running-CGESDD-SGESDD-on-Xeon/m-p/1129488#M25504</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-04-04T06:07:22Z</dc:date>
    </item>
    <item>
      <title>Jian, have you tried to check</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-is-slower-than-AVX2-when-running-CGESDD-SGESDD-on-Xeon/m-p/1129489#M25505</link>
      <description>&lt;P&gt;Jian, have you tried to check the problem with the same inputs?&lt;/P&gt;</description>
      <pubDate>Wed, 22 Apr 2020 04:54:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-is-slower-than-AVX2-when-running-CGESDD-SGESDD-on-Xeon/m-p/1129489#M25505</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-04-22T04:54:18Z</dc:date>
    </item>
  </channel>
</rss>

