<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Here is the result: in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-slower-than-AVX2-What-I-am-doing-wrong/m-p/1170682#M28560</link>
    <description>&lt;P&gt;Here is the result:&lt;/P&gt;

&lt;P&gt;MKL_VERBOSE Intel(R) MKL 2018.0 Update 2 Product build 20180127 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.20GHz lp64 sequential&lt;BR /&gt;
	MKL_VERBOSE DGEMM(n,n,3000,3000,3000,0x7ffe29d0bda8,0x7f8b5bfe1620,3000,0x7f8b6048b820,3000,0x7ffe29d0bdb0,0x7f8b64935a20,3000) 2.14s CNR:OFF Dyn:1 FastMM:1 TID:0&amp;nbsp; NThr:1&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 20 Apr 2018 12:37:02 GMT</pubDate>
    <dc:creator>tomasz_j_2</dc:creator>
    <dc:date>2018-04-20T12:37:02Z</dc:date>
    <item>
      <title>AVX512 slower than AVX2? What I am doing wrong?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-slower-than-AVX2-What-I-am-doing-wrong/m-p/1170678#M28556</link>
      <description>&lt;P&gt;Hello All,&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;I was so excited to test new the new Intel Xeon Silver 4114 CPU just to find out that with AVX512 enabled the performance of the matrix multiplication is the same as with legacy SSE4. If I restrict the MKL library to use AVX2 only,&amp;nbsp; then the speed of the computation is twice as fast. What I am doing wrong here? The library seem to respond OK to the following call (here in FORTRAN):&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;stat=mkl_cbwr_set (MKL_CBWR_AVX512)&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;with stat == 0. But the computation slows down by a factor of two compared to the speed I get after setting the environment variable&amp;nbsp;MKL_ENABLE_INSTRUCTIONS to AVX2. Is possible that this is what I should get for this particular CPU? The MKL version is&amp;nbsp;2018.2.199.&lt;/P&gt;

&lt;P&gt;Thanks!&lt;/P&gt;

&lt;P&gt;Tomasz&lt;/P&gt;</description>
      <pubDate>Tue, 17 Apr 2018 20:03:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-slower-than-AVX2-What-I-am-doing-wrong/m-p/1170678#M28556</guid>
      <dc:creator>tomasz_j_2</dc:creator>
      <dc:date>2018-04-17T20:03:30Z</dc:date>
    </item>
    <item>
      <title>Tomasz, what input size do</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-slower-than-AVX2-What-I-am-doing-wrong/m-p/1170679#M28557</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 12px;"&gt;Tomasz, what input size do you observe such gap? We will check. Is that Lin* OS?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Apr 2018 02:46:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-slower-than-AVX2-What-I-am-doing-wrong/m-p/1170679#M28557</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2018-04-18T02:46:58Z</dc:date>
    </item>
    <item>
      <title>Yes, it is Linux, kernel 4.9</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-slower-than-AVX2-What-I-am-doing-wrong/m-p/1170680#M28558</link>
      <description>&lt;P&gt;Yes, it is Linux, kernel 4.9.0-0, Debian OS.&lt;/P&gt;

&lt;P&gt;I am testing on matrices that are 3000x3000 in dimension, double precision numbers. I did some research and I suspect that this is what I should get. Intel website says that Silver 4114 has one FPU per core which is capable of AVX512. If this is true, then the increase of efficiency coming from AVX512 is offset by less FPUs available on the chip (I suspect there are 2 FPUs capable of AVX2). The numbers I get for 3000x3000 matrices are as follows:&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;24.668 Gflop/s AVX512&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;40.540&amp;nbsp;Gflop/s AVX2&lt;/P&gt;

&lt;P&gt;21.739 Gflop/s AVX&lt;/P&gt;

&lt;P&gt;11.668&amp;nbsp;Gflop/s SSE4_2&lt;/P&gt;

&lt;P&gt;If my suspicion about the number of FPUs is correct then MKL should fall back to AVX2 on Xeon Silver to get the max throughput.&lt;/P&gt;

&lt;P&gt;I hope my guess is not correct, otherwise what would be goal of retrofitting the CPU with crippled AVX512 capability?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thanks!&lt;/P&gt;

&lt;P&gt;Tomasz&lt;/P&gt;</description>
      <pubDate>Wed, 18 Apr 2018 12:26:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-slower-than-AVX2-What-I-am-doing-wrong/m-p/1170680#M28558</guid>
      <dc:creator>tomasz_j_2</dc:creator>
      <dc:date>2018-04-18T12:26:42Z</dc:date>
    </item>
    <item>
      <title>Could you please set MKL</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-slower-than-AVX2-What-I-am-doing-wrong/m-p/1170681#M28559</link>
      <description>&lt;P&gt;Could you please set MKL_VERBOSE=1 env variable to check if AVX-512 branch of the MKL code has been executed?&lt;/P&gt;</description>
      <pubDate>Fri, 20 Apr 2018 04:23:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-slower-than-AVX2-What-I-am-doing-wrong/m-p/1170681#M28559</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2018-04-20T04:23:08Z</dc:date>
    </item>
    <item>
      <title>Here is the result:</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-slower-than-AVX2-What-I-am-doing-wrong/m-p/1170682#M28560</link>
      <description>&lt;P&gt;Here is the result:&lt;/P&gt;

&lt;P&gt;MKL_VERBOSE Intel(R) MKL 2018.0 Update 2 Product build 20180127 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) enabled processors, Lnx 2.20GHz lp64 sequential&lt;BR /&gt;
	MKL_VERBOSE DGEMM(n,n,3000,3000,3000,0x7ffe29d0bda8,0x7f8b5bfe1620,3000,0x7f8b6048b820,3000,0x7ffe29d0bdb0,0x7f8b64935a20,3000) 2.14s CNR:OFF Dyn:1 FastMM:1 TID:0&amp;nbsp; NThr:1&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Apr 2018 12:37:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-slower-than-AVX2-What-I-am-doing-wrong/m-p/1170682#M28560</guid>
      <dc:creator>tomasz_j_2</dc:creator>
      <dc:date>2018-04-20T12:37:02Z</dc:date>
    </item>
    <item>
      <title>So, can I conclude that this</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-slower-than-AVX2-What-I-am-doing-wrong/m-p/1170683#M28561</link>
      <description>&lt;P&gt;So, can I conclude that this is what I should get from this processor, and AVX512 is, in fact, slower than legacy AVX2 on Xeon Silver? I see that the the "Gold" series has two FPUs per core.&lt;/P&gt;

&lt;P&gt;I still hope that the answer is negative and something can be done.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Apr 2018 20:29:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-slower-than-AVX2-What-I-am-doing-wrong/m-p/1170683#M28561</guid>
      <dc:creator>tomasz_j_2</dc:creator>
      <dc:date>2018-04-24T20:29:36Z</dc:date>
    </item>
    <item>
      <title>PMU unit of low-end "Silver"</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-slower-than-AVX2-What-I-am-doing-wrong/m-p/1170684#M28562</link>
      <description>&lt;P&gt;PMU unit of low-end "Silver" processors will probably more eagerly lower the reference clock of cores which execute AVX512 code.&lt;/P&gt;&lt;P&gt;You should invest in Gold SKU or maybe in HEDT Skylake-X processors.&lt;/P&gt;</description>
      <pubDate>Wed, 12 Dec 2018 17:06:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/AVX512-slower-than-AVX2-What-I-am-doing-wrong/m-p/1170684#M28562</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2018-12-12T17:06:06Z</dc:date>
    </item>
  </channel>
</rss>

