<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re:Performance of cblas_ddot when incx &amp;gt; 1 in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-cblas-ddot-when-incx-gt-1/m-p/1440862#M34046</link>
    <description>&lt;P&gt;Hi Dmitry,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thank you for your patience. The issue raised by you have been fixed in &amp;lt;2023.0&amp;gt; version. Please&lt;/P&gt;&lt;P&gt;download and let us know if this resolves your issue.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Best Regards,&lt;/P&gt;&lt;P&gt;Shanmukh.SS&lt;/P&gt;&lt;BR /&gt;</description>
    <pubDate>Thu, 22 Dec 2022 14:08:36 GMT</pubDate>
    <dc:creator>ShanmukhS_Intel</dc:creator>
    <dc:date>2022-12-22T14:08:36Z</dc:date>
    <item>
      <title>Performance of cblas_ddot when incx &gt; 1</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-cblas-ddot-when-incx-gt-1/m-p/1414868#M33654</link>
      <description>&lt;P&gt;We are using MKL in NumPy.&amp;nbsp; &amp;nbsp;We noticed that performance of cblas_ddot (running on &lt;STRONG&gt;single&lt;/STRONG&gt; thread) **significantly** depends on values of incx and incy.&amp;nbsp; We were able to write a simple C code that runs 2x faster than cblas_ddot when incx and incy &amp;gt; 1.&amp;nbsp; &amp;nbsp;We think that there is a bug&amp;nbsp; MKL code.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Example&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;we have two 3-dimensional array x_dgt and y_dgt of shape (100, 70, 144).&amp;nbsp; We measure performance of vectorized dot operation over 3 axis:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;t: &lt;FONT face="courier new,courier"&gt;'dgt,dgt-&amp;gt;dg'&lt;/FONT&gt;&lt;/LI&gt;
&lt;LI&gt;g: &lt;FONT face="courier new,courier"&gt;'dgt,dgt-&amp;gt;dt'&lt;/FONT&gt;&lt;/LI&gt;
&lt;LI&gt;d: &lt;FONT face="courier new,courier"&gt;'dgt,dgt-&amp;gt;gt'&lt;/FONT&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;To compute dot we use either cblas_ddot or custom implementation of ddot that essentially unravels loop in block size of 8 elements, and assumes that -O3 option in compiler will replace the unraveled loop by AVX instruction.&amp;nbsp; &amp;nbsp;The attached code is attached.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;cblas_dot over t:&amp;nbsp; 560.7 us&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; &amp;nbsp;my_dot over t:&amp;nbsp; 674.0 us&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;cblas_dot over g: 1113.4 us&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; &amp;nbsp;my_dot over g:&amp;nbsp; 562.4 us&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;cblas_dot over d: 1277.4 us&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; &amp;nbsp;my_dot over d:&amp;nbsp; 747.0 us&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As you can see, our simple code works faster than cblas_ddot when incx, incy &amp;gt; 1.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Attached code&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;We use gcc to compile the code.&amp;nbsp; Here is the string:&lt;/P&gt;
&lt;P&gt;gcc mkl_dot.c -DMKL_ILP64 -m64 -I"/opt/miniconda3/include" -L/opt/miniconda3/lib -Wl,--no-as-needed -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl -O3 -o mkl_dot.o&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 15 Sep 2022 02:57:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-cblas-ddot-when-incx-gt-1/m-p/1414868#M33654</guid>
      <dc:creator>DmitryB1</dc:creator>
      <dc:date>2022-09-15T02:57:36Z</dc:date>
    </item>
    <item>
      <title>Re:Performance of cblas_ddot when incx &gt; 1</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-cblas-ddot-when-incx-gt-1/m-p/1415598#M33668</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thank you for posting on Intel Communities.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Could you please share your environment details like software version etc, so that we could look into your issue further?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Best Regards,&lt;/P&gt;&lt;P&gt;Shanmukh.SS&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 19 Sep 2022 10:46:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-cblas-ddot-when-incx-gt-1/m-p/1415598#M33668</guid>
      <dc:creator>ShanmukhS_Intel</dc:creator>
      <dc:date>2022-09-19T10:46:19Z</dc:date>
    </item>
    <item>
      <title>Re: Performance of cblas_ddot when incx &gt; 1</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-cblas-ddot-when-incx-gt-1/m-p/1415627#M33669</link>
      <description>&lt;P&gt;Sure, here it is&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;CentOS 7&lt;/LI&gt;
&lt;LI&gt;kernel 3.10.0-1062.18.1.el7.x86_64&lt;/LI&gt;
&lt;LI&gt;gcc&amp;nbsp;7.3.1 20180303 (Red Hat 7.3.1-5)&lt;/LI&gt;
&lt;LI&gt;MKL 2021.4.0&lt;/LI&gt;
&lt;LI&gt;CPU:&amp;nbsp;Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Is there anything else you would like to know about my environment?&lt;/P&gt;
&lt;P&gt;Best,&lt;BR /&gt;Dmitry.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Sep 2022 13:20:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-cblas-ddot-when-incx-gt-1/m-p/1415627#M33669</guid>
      <dc:creator>DmitryB1</dc:creator>
      <dc:date>2022-09-19T13:20:13Z</dc:date>
    </item>
    <item>
      <title>Re: Performance of cblas_ddot when incx &gt; 1</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-cblas-ddot-when-incx-gt-1/m-p/1415628#M33670</link>
      <description>&lt;P&gt;We execute the code about on single thread. To this end we set "export OMP_NUM_THREADS=1" and "export MKL_NUM_THREADS=1" in the terminal.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 19 Sep 2022 13:21:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-cblas-ddot-when-incx-gt-1/m-p/1415628#M33670</guid>
      <dc:creator>DmitryB1</dc:creator>
      <dc:date>2022-09-19T13:21:57Z</dc:date>
    </item>
    <item>
      <title>Re: Performance of cblas_ddot when incx &gt; 1</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-cblas-ddot-when-incx-gt-1/m-p/1415974#M33677</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We would like to inform you that performance could vary based on various scenarios like use, configuration and other factors.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1. Instruction set (MKL uses AVX2/AVX512, but reproducer uses SSE2)&lt;/P&gt;
&lt;P&gt;2. MKL uses FMA, but the reproducer uses MUL + ADD. Or using fused instruction (load + FP instructions).&lt;/P&gt;
&lt;P&gt;3. Unroll type&lt;/P&gt;
&lt;P&gt;4. Frequency&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We will get back to you soon with an update regarding the progress.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Best Regards,&lt;/P&gt;
&lt;P&gt;Shanmukh.SS&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 23 Sep 2022 05:23:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-cblas-ddot-when-incx-gt-1/m-p/1415974#M33677</guid>
      <dc:creator>ShanmukhS_Intel</dc:creator>
      <dc:date>2022-09-23T05:23:37Z</dc:date>
    </item>
    <item>
      <title>Re:Performance of cblas_ddot when incx &gt; 1</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-cblas-ddot-when-incx-gt-1/m-p/1416789#M33697</link>
      <description>&lt;P&gt;Hi Dimitry,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for reporting this issue. We were able to reproduce it and we have informed the development team regarding the same. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Best Regards,&lt;/P&gt;&lt;P&gt;Shanmukh.SS&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 23 Sep 2022 05:21:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-cblas-ddot-when-incx-gt-1/m-p/1416789#M33697</guid>
      <dc:creator>ShanmukhS_Intel</dc:creator>
      <dc:date>2022-09-23T05:21:49Z</dc:date>
    </item>
    <item>
      <title>Re:Performance of cblas_ddot when incx &gt; 1</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-cblas-ddot-when-incx-gt-1/m-p/1440862#M34046</link>
      <description>&lt;P&gt;Hi Dmitry,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thank you for your patience. The issue raised by you have been fixed in &amp;lt;2023.0&amp;gt; version. Please&lt;/P&gt;&lt;P&gt;download and let us know if this resolves your issue.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Best Regards,&lt;/P&gt;&lt;P&gt;Shanmukh.SS&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 22 Dec 2022 14:08:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-cblas-ddot-when-incx-gt-1/m-p/1440862#M34046</guid>
      <dc:creator>ShanmukhS_Intel</dc:creator>
      <dc:date>2022-12-22T14:08:36Z</dc:date>
    </item>
    <item>
      <title>Re:Performance of cblas_ddot when incx &gt; 1</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-cblas-ddot-when-incx-gt-1/m-p/1442457#M34091</link>
      <description>&lt;P&gt;Hi Dimitry,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;We assume that your issue is resolved.&amp;nbsp;If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Have a great day!&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Best Regards,&lt;/P&gt;&lt;P&gt;Shanmukh.SS&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 29 Dec 2022 08:19:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Performance-of-cblas-ddot-when-incx-gt-1/m-p/1442457#M34091</guid>
      <dc:creator>ShanmukhS_Intel</dc:creator>
      <dc:date>2022-12-29T08:19:28Z</dc:date>
    </item>
  </channel>
</rss>

