<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic  Intel Math Kernel Library Cblas int8 gemm and dnnl int8 gemm in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-Math-Kernel-Library-Cblas-int8-gemm-and-dnnl-int8-gemm/m-p/1151801#M27186</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I have some questions on&amp;nbsp;cblas_gemm_s8u8s32.&lt;/P&gt;&lt;P&gt;1. What is the reasoning behind requiring one side to be signed and the other unsigned?&lt;/P&gt;&lt;P&gt;2. When I do matrix multiplication with cblas_gemm_s8u8s32 function, I find that when the column major&amp;nbsp;and the second operator(&amp;nbsp;the unsigned int8 integer value) exceeds 128, the calculation result is wrong. What is the reason? How do I calculate the multiplication of two signed int8 matrices.&lt;/P&gt;&lt;P&gt;3. I tried to use MKLDNN DNNL dnnl_gemm_s8s8s32, but unfortunately, but unfortunately, it was much slower than MKL's cblas_sgemm function on some scales.&lt;/P&gt;&lt;P&gt;4.&amp;nbsp;I tested the efficiency of int8 GEMM (Use cblas_gemm_s8u8s32) and float GEMM on my machine and found that the speed of int8 GEMM is close to float. Why? Do you have the efficiency test results of two interfaces?&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Jingjing Wang&lt;/P&gt;</description>
    <pubDate>Thu, 05 Dec 2019 07:59:45 GMT</pubDate>
    <dc:creator>jingjing__wang</dc:creator>
    <dc:date>2019-12-05T07:59:45Z</dc:date>
    <item>
      <title>Intel Math Kernel Library Cblas int8 gemm and dnnl int8 gemm</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-Math-Kernel-Library-Cblas-int8-gemm-and-dnnl-int8-gemm/m-p/1151801#M27186</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I have some questions on&amp;nbsp;cblas_gemm_s8u8s32.&lt;/P&gt;&lt;P&gt;1. What is the reasoning behind requiring one side to be signed and the other unsigned?&lt;/P&gt;&lt;P&gt;2. When I do matrix multiplication with cblas_gemm_s8u8s32 function, I find that when the column major&amp;nbsp;and the second operator(&amp;nbsp;the unsigned int8 integer value) exceeds 128, the calculation result is wrong. What is the reason? How do I calculate the multiplication of two signed int8 matrices.&lt;/P&gt;&lt;P&gt;3. I tried to use MKLDNN DNNL dnnl_gemm_s8s8s32, but unfortunately, but unfortunately, it was much slower than MKL's cblas_sgemm function on some scales.&lt;/P&gt;&lt;P&gt;4.&amp;nbsp;I tested the efficiency of int8 GEMM (Use cblas_gemm_s8u8s32) and float GEMM on my machine and found that the speed of int8 GEMM is close to float. Why? Do you have the efficiency test results of two interfaces?&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Jingjing Wang&lt;/P&gt;</description>
      <pubDate>Thu, 05 Dec 2019 07:59:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-Math-Kernel-Library-Cblas-int8-gemm-and-dnnl-int8-gemm/m-p/1151801#M27186</guid>
      <dc:creator>jingjing__wang</dc:creator>
      <dc:date>2019-12-05T07:59:45Z</dc:date>
    </item>
    <item>
      <title>Hello Jingjing,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-Math-Kernel-Library-Cblas-int8-gemm-and-dnnl-int8-gemm/m-p/1151802#M27187</link>
      <description>&lt;P&gt;Hello&amp;nbsp;Jingjing,&lt;/P&gt;&lt;P&gt;The reason&amp;nbsp;for signed and unsigned has to do with the AVX 512 VNNI hardware instruction set underneath&amp;nbsp;the&amp;nbsp;software&amp;nbsp;interface.&amp;nbsp; &amp;nbsp;For example,&amp;nbsp; using&amp;nbsp;&lt;A href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=3146,2195,2198,2210,2205&amp;amp;techs=AVX2&amp;amp;avx512techs=AVX512_VNNI&amp;amp;text=vpdpbusd"&gt;vpdpbusd&lt;/A&gt; [1]&amp;nbsp;instead of&amp;nbsp;&amp;nbsp;&lt;A href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=3146,2195,2198,2210,2205,2201,97,100,98,3536&amp;amp;text=vpmaddubsw"&gt;vpmaddubsw&lt;/A&gt;,&amp;nbsp;&lt;A href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=3146,2195,2198,2210,2205,2201&amp;amp;text=vpmaddwd"&gt;vpmaddwd&lt;/A&gt;, and&amp;nbsp;&lt;A href="https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=3146,2195,2198,2210,2205,2201,97,100,98&amp;amp;text=vpaddd"&gt;vpaddd&lt;/A&gt;&lt;EM&gt;.&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;Could you provide more information about particular matrix sizes you are interested in testing?&lt;/P&gt;&lt;P&gt;Even better, it would&amp;nbsp;help expedite if you could provide a consise&amp;nbsp;reproducer, application source code with minimal dependencies, for each issue; 2, 3 and 4.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for your good questions about cblas_gemm_s8u8s32!&lt;/P&gt;&lt;P&gt;Aaron&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;EM&gt;[1]&lt;/EM&gt;&amp;nbsp;https://www.intel.ai/vnni-enables-inference/&lt;/P&gt;</description>
      <pubDate>Fri, 06 Dec 2019 18:00:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-Math-Kernel-Library-Cblas-int8-gemm-and-dnnl-int8-gemm/m-p/1151802#M27187</guid>
      <dc:creator>Aaron_J_Intel2</dc:creator>
      <dc:date>2019-12-06T18:00:00Z</dc:date>
    </item>
    <item>
      <title>Here are two discussions that</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-Math-Kernel-Library-Cblas-int8-gemm-and-dnnl-int8-gemm/m-p/1151803#M27188</link>
      <description>&lt;P&gt;Here are two discussions that may shed light on your questions.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Incorrect result of s8s8s32 gemm?&amp;nbsp; &amp;nbsp; &lt;A href="https://github.com/intel/mkl-dnn/issues/476" target="_blank"&gt;https://github.com/intel/mkl-dnn/issues/476&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Best instruction set for s8s8s32 gemm ?&amp;nbsp; &amp;nbsp;&amp;nbsp;https://github.com/intel/mkl-dnn/issues/532&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Let me know if you have further questions or a reproducer,&lt;/P&gt;&lt;P&gt;Aaron&lt;/P&gt;</description>
      <pubDate>Fri, 06 Dec 2019 18:47:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-Math-Kernel-Library-Cblas-int8-gemm-and-dnnl-int8-gemm/m-p/1151803#M27188</guid>
      <dc:creator>Aaron_J_Intel2</dc:creator>
      <dc:date>2019-12-06T18:47:53Z</dc:date>
    </item>
    <item>
      <title>Hi Jingjing,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-Math-Kernel-Library-Cblas-int8-gemm-and-dnnl-int8-gemm/m-p/1151804#M27189</link>
      <description>&lt;P&gt;Hi Jingjing,&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;For #3 and #4, can&amp;nbsp;you also provide information on the CPU you used when checking performance? If you're running on an AVX2 machine, then the performance behavior you're seeing is expected.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Best,&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;Peter&lt;/P&gt;</description>
      <pubDate>Fri, 06 Dec 2019 19:22:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-Math-Kernel-Library-Cblas-int8-gemm-and-dnnl-int8-gemm/m-p/1151804#M27189</guid>
      <dc:creator>Peter_C_Intel</dc:creator>
      <dc:date>2019-12-06T19:22:12Z</dc:date>
    </item>
    <item>
      <title>Quote:Caday, Peter (Intel)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-Math-Kernel-Library-Cblas-int8-gemm-and-dnnl-int8-gemm/m-p/1151805#M27190</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Caday, Peter (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hi Jingjing,&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;For #3 and #4, can&amp;nbsp;you also provide information on the CPU you used when checking performance? If you're running on an AVX2 machine, then the performance behavior you're seeing is expected.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Best,&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;Peter&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt; Thank you for your reply. I check the performance on Intel Xeon CPU E5-2667 v3 @3.2GHz, may be it only support AVX2.&amp;nbsp;&lt;P&gt;&lt;/P&gt;&lt;P&gt;That is to say, dnnl &amp;nbsp;Int8 gemmed&amp;nbsp;will only perform better when it supports AVX512 or higher instruction sets?&lt;/P&gt;</description>
      <pubDate>Sat, 07 Dec 2019 02:11:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-Math-Kernel-Library-Cblas-int8-gemm-and-dnnl-int8-gemm/m-p/1151805#M27190</guid>
      <dc:creator>jingjing__wang</dc:creator>
      <dc:date>2019-12-07T02:11:47Z</dc:date>
    </item>
    <item>
      <title>Hi Jingjing,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-Math-Kernel-Library-Cblas-int8-gemm-and-dnnl-int8-gemm/m-p/1151806#M27191</link>
      <description>&lt;P&gt;Hi Jingjing,&lt;/P&gt;&lt;P&gt;We recently add support for avx2 in DNNL for int8 gemm (around end of November, check this commit &lt;A href="https://github.com/intel/mkl-dnn/commit/35b39a8dd2ad7f708f9456ed3f787ad8b9817973"&gt;35b39a8d&lt;/A&gt;). Anyways, performance of int8 vs single precision shouldn't be much better on avx2 platform.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Dec 2019 18:37:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-Math-Kernel-Library-Cblas-int8-gemm-and-dnnl-int8-gemm/m-p/1151806#M27191</guid>
      <dc:creator>Arthur_A_Intel</dc:creator>
      <dc:date>2019-12-17T18:37:25Z</dc:date>
    </item>
  </channel>
</rss>

