<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to optimize in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-to-optimize/m-p/1579067#M35898</link>
    <description>&lt;P&gt;Hi all !&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm new using Intel MKL routines. I have the following function that multiplies two dense matrices:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;void&lt;/SPAN&gt; &lt;SPAN&gt;multiplyMatrices&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;double*&lt;/SPAN&gt; &lt;SPAN&gt;A&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;double*&lt;/SPAN&gt; &lt;SPAN&gt;B&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;double*&lt;/SPAN&gt; &lt;SPAN&gt;C&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;int&lt;/SPAN&gt; &lt;SPAN&gt;N&lt;/SPAN&gt;&lt;SPAN&gt;) {&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;double&lt;/SPAN&gt; &lt;SPAN&gt;alpha&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;1.0&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;beta&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;0.0&lt;/SPAN&gt;&lt;SPAN&gt;;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;cblas_dgemm&lt;/SPAN&gt;&lt;SPAN&gt;(CblasRowMajor, CblasNoTrans, CblasNoTrans, &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;N&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;N&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;N&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;alpha&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;A&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;N&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;B&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;N&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;beta&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;C&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;N&lt;/SPAN&gt;&lt;SPAN&gt;);&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I compile with the followin flags&lt;/P&gt;&lt;P&gt;g++ -std=c++11 -O3&amp;nbsp; -I/opt/intel/oneapi/2022/mkl/latest/include&amp;nbsp; c laicoMult.cpp -L/opt/intel/oneapi/2022/mkl/latest/lib/intel64 -Wl,-rpath,/opt/intel/oneapi/mkl/2024.0/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -ldl -fopenmp -march=native -fopt-info-vec -ffast-math -ftree-vectorize -DARMA_DONT_USE_WRAPPER -o laicoMult&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;At the moment I'm using a Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz processor with 16GB DDR4.&lt;/P&gt;&lt;P&gt;The code is pretty fast but I'd like to know if there is something I could do to make this code faster.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can anyone help me ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance !&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sat, 09 Mar 2024 03:40:02 GMT</pubDate>
    <dc:creator>FredCabral</dc:creator>
    <dc:date>2024-03-09T03:40:02Z</dc:date>
    <item>
      <title>How to optimize</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-to-optimize/m-p/1579067#M35898</link>
      <description>&lt;P&gt;Hi all !&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm new using Intel MKL routines. I have the following function that multiplies two dense matrices:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;void&lt;/SPAN&gt; &lt;SPAN&gt;multiplyMatrices&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;double*&lt;/SPAN&gt; &lt;SPAN&gt;A&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;double*&lt;/SPAN&gt; &lt;SPAN&gt;B&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;double*&lt;/SPAN&gt; &lt;SPAN&gt;C&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;int&lt;/SPAN&gt; &lt;SPAN&gt;N&lt;/SPAN&gt;&lt;SPAN&gt;) {&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;double&lt;/SPAN&gt; &lt;SPAN&gt;alpha&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;1.0&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;beta&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;0.0&lt;/SPAN&gt;&lt;SPAN&gt;;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;cblas_dgemm&lt;/SPAN&gt;&lt;SPAN&gt;(CblasRowMajor, CblasNoTrans, CblasNoTrans, &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;N&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;N&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;N&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;alpha&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;A&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;N&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;B&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;N&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;beta&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;C&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;N&lt;/SPAN&gt;&lt;SPAN&gt;);&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I compile with the followin flags&lt;/P&gt;&lt;P&gt;g++ -std=c++11 -O3&amp;nbsp; -I/opt/intel/oneapi/2022/mkl/latest/include&amp;nbsp; c laicoMult.cpp -L/opt/intel/oneapi/2022/mkl/latest/lib/intel64 -Wl,-rpath,/opt/intel/oneapi/mkl/2024.0/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -ldl -fopenmp -march=native -fopt-info-vec -ffast-math -ftree-vectorize -DARMA_DONT_USE_WRAPPER -o laicoMult&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;At the moment I'm using a Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz processor with 16GB DDR4.&lt;/P&gt;&lt;P&gt;The code is pretty fast but I'd like to know if there is something I could do to make this code faster.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can anyone help me ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance !&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 09 Mar 2024 03:40:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-to-optimize/m-p/1579067#M35898</guid>
      <dc:creator>FredCabral</dc:creator>
      <dc:date>2024-03-09T03:40:02Z</dc:date>
    </item>
    <item>
      <title>Re:How to optimize</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-to-optimize/m-p/1579360#M35901</link>
      <description>&lt;P&gt;yes, you could link against threaded version of MKL instead of sequential (&lt;SPAN style="font-size: 14px;"&gt;-lmkl_sequential&amp;nbsp;) once. Please check with MKL Linker Adviser ( &lt;/SPAN&gt;&lt;A href="https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html" target="_blank"&gt;https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html&lt;/A&gt;&lt;SPAN style="font-size: 14px;"&gt; ) how to link against OpenMP or TBB threaded runtimes. You also might check MKL Developer Guide where you could find out many of such examples.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 11 Mar 2024 09:44:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-to-optimize/m-p/1579360#M35901</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2024-03-11T09:44:20Z</dc:date>
    </item>
  </channel>
</rss>

