<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: bad performance with cblas_dger using AVX2 on i7 12th gen in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/bad-performance-with-cblas-dger-using-AVX2-on-i7-12th-gen/m-p/1459669#M34288</link>
    <description>&lt;P&gt;Hi Manuel,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We would like to inform you that the performance difference comes from the core architecture. The recent desktop uses Cove cores, but it has a larger cache and more memory channels than old AVX2 desktop cores. This resulted in behavior differences and simultaneous access against memory performs better on recent desktop parts.&amp;nbsp;This is measured on ICX. "test" is Fortran code based and behavior is similar to AVX. Please find the performance charts attached.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Multiple memory access will cause performance degradations on AVX2-based Xeon. MKL doesn't have a mechanism to distinguish old and new AVX2-based architectures. So performance improvement could not be made.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Best Regards,&lt;/P&gt;
&lt;P&gt;Shanmukh.SS&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 01 Mar 2023 06:06:52 GMT</pubDate>
    <dc:creator>ShanmukhS_Intel</dc:creator>
    <dc:date>2023-03-01T06:06:52Z</dc:date>
    <item>
      <title>bad performance with cblas_dger using AVX2 on i7 12th gen</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/bad-performance-with-cblas-dger-using-AVX2-on-i7-12th-gen/m-p/1450770#M34225</link>
      <description>&lt;P&gt;We are currently evaluating the usage of Intel MKL to improve the performance of our application. However we found out that on computers with a Intel i7 12th gen CPU, the performance significantly decreased when using Intel MKL. Profiling the application showed that two MKL BLAS function were taking up most of the CPU time, namely&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;[MKL BLAS]@avx2_xdaxpy&lt;/LI&gt;
&lt;LI&gt;[MKL BLAS]@avx2_dger&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;We are able to reproduce the issue with the attached modified mkl-sample programm.&lt;/P&gt;
&lt;P&gt;With said programm we can see that the mkl function cblas_dger runs considerably slower on i7-12th gen CPU when using the AVX2 instruction-set with a single thread compared to using the AVX instruction-set with a single thread.&lt;/P&gt;
&lt;P&gt;Running the same code on a i7 10th gen showed increased performance when using the AVX2 instruction set.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;See the attached screenshot for a timing of 1'000 calls to said function on a i7-12700K.&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="screen_behaviour.PNG" style="width: 400px;"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/37495iC5C670CB340DDF5F/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="screen_behaviour.PNG" alt="screen_behaviour.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;used oneMKL version: oneMKL 2023.0 Product build 20221128&lt;/P&gt;</description>
      <pubDate>Fri, 27 Jan 2023 16:32:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/bad-performance-with-cblas-dger-using-AVX2-on-i7-12th-gen/m-p/1450770#M34225</guid>
      <dc:creator>Manuel7</dc:creator>
      <dc:date>2023-01-27T16:32:14Z</dc:date>
    </item>
    <item>
      <title>Re:bad performance with cblas_dger using AVX2 on i7 12th gen</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/bad-performance-with-cblas-dger-using-AVX2-on-i7-12th-gen/m-p/1452160#M34233</link>
      <description>&lt;P&gt;Hi Manuel,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for posting on Intel Communities.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for sharing the feedback. We have informed the development team regarding the same. We will get back to you soon with an update.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Best Regards,&lt;/P&gt;&lt;P&gt;Shanmukh.SS&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 01 Feb 2023 11:08:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/bad-performance-with-cblas-dger-using-AVX2-on-i7-12th-gen/m-p/1452160#M34233</guid>
      <dc:creator>ShanmukhS_Intel</dc:creator>
      <dc:date>2023-02-01T11:08:15Z</dc:date>
    </item>
    <item>
      <title>Re: bad performance with cblas_dger using AVX2 on i7 12th gen</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/bad-performance-with-cblas-dger-using-AVX2-on-i7-12th-gen/m-p/1459669#M34288</link>
      <description>&lt;P&gt;Hi Manuel,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We would like to inform you that the performance difference comes from the core architecture. The recent desktop uses Cove cores, but it has a larger cache and more memory channels than old AVX2 desktop cores. This resulted in behavior differences and simultaneous access against memory performs better on recent desktop parts.&amp;nbsp;This is measured on ICX. "test" is Fortran code based and behavior is similar to AVX. Please find the performance charts attached.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Multiple memory access will cause performance degradations on AVX2-based Xeon. MKL doesn't have a mechanism to distinguish old and new AVX2-based architectures. So performance improvement could not be made.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Best Regards,&lt;/P&gt;
&lt;P&gt;Shanmukh.SS&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 01 Mar 2023 06:06:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/bad-performance-with-cblas-dger-using-AVX2-on-i7-12th-gen/m-p/1459669#M34288</guid>
      <dc:creator>ShanmukhS_Intel</dc:creator>
      <dc:date>2023-03-01T06:06:52Z</dc:date>
    </item>
    <item>
      <title>Re: bad performance with cblas_dger using AVX2 on i7 12th gen</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/bad-performance-with-cblas-dger-using-AVX2-on-i7-12th-gen/m-p/1462752#M34318</link>
      <description>&lt;P&gt;Hi Manuel,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;A gentle reminder:&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Has the information provided helped? Could you please let us know if we could close this case at our end?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Best Regards,&lt;/P&gt;
&lt;P&gt;Shanmukh.SS&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 07 Mar 2023 16:48:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/bad-performance-with-cblas-dger-using-AVX2-on-i7-12th-gen/m-p/1462752#M34318</guid>
      <dc:creator>ShanmukhS_Intel</dc:creator>
      <dc:date>2023-03-07T16:48:27Z</dc:date>
    </item>
    <item>
      <title>Re:bad performance with cblas_dger using AVX2 on i7 12th gen</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/bad-performance-with-cblas-dger-using-AVX2-on-i7-12th-gen/m-p/1465428#M34341</link>
      <description>&lt;P&gt;Hi Manuel,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;We assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Best Regards,&lt;/P&gt;&lt;P&gt;Shanmukh.SS&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 14 Mar 2023 07:14:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/bad-performance-with-cblas-dger-using-AVX2-on-i7-12th-gen/m-p/1465428#M34341</guid>
      <dc:creator>ShanmukhS_Intel</dc:creator>
      <dc:date>2023-03-14T07:14:54Z</dc:date>
    </item>
  </channel>
</rss>

