<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re:Does oneMKL axpy optimally tuned for Intel GPUs ? in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Does-oneMKL-axpy-optimally-tuned-for-Intel-GPUs/m-p/1203400#M29941</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Thanks for reaching out to us!&lt;/P&gt;&lt;P&gt;Since your issue is related to oneMKL, we are moving this query to the &lt;B&gt;Intel® oneAPI Math Kernel Library &amp;amp; Intel® Math Kernel Library&lt;/B&gt; forum for a faster response.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Goutham&lt;/P&gt;&lt;BR /&gt;</description>
    <pubDate>Mon, 24 Aug 2020 08:38:36 GMT</pubDate>
    <dc:creator>GouthamK_Intel</dc:creator>
    <dc:date>2020-08-24T08:38:36Z</dc:date>
    <item>
      <title>Does oneMKL axpy optimally tuned for Intel GPUs ?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Does-oneMKL-axpy-optimally-tuned-for-Intel-GPUs/m-p/1202382#M29940</link>
      <description>&lt;P&gt;I wonder if I can improve the performance of the following snippet that I would like to use to assess the bandwidth of Intel GPUs :&lt;/P&gt;
&lt;LI-CODE lang="cpp"&gt;#include &amp;lt;fstream&amp;gt;
#include &amp;lt;iostream&amp;gt;
#include &amp;lt;CL/sycl.hpp&amp;gt;
#include &amp;lt;chrono&amp;gt;
#include &amp;lt;cmath&amp;gt;
#include &amp;lt;cstring&amp;gt;
#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;iostream&amp;gt;
#include &amp;lt;random&amp;gt;
#include &amp;lt;fstream&amp;gt;
#include &amp;lt;chrono&amp;gt;
#include "mkl_sycl.hpp"
#include "dpc_common.hpp"


using namespace cl::sycl;
using namespace std;

constexpr size_t NITER=100; //amortize device/host communication
using Scalar=float;

template &amp;lt;class T&amp;gt;
void bench_axpy(size_t N){

  std::vector&amp;lt;T&amp;gt; a(N,1);
  std::vector&amp;lt;T&amp;gt; b(N,2);
  gpu_selector device_selector;
  queue q(device_selector, dpc_common::exception_handler);
  
  auto start=std::chrono::high_resolution_clock::now();
  {  // Begin buffer scope
    buffer buf_a(&amp;amp;a[0], range(N));// Create buffers using DPC++ class buffer
    buffer buf_b(&amp;amp;b[0], range(N));

    const T alpha=0.5;
    try{
        for (size_t iter=0; iter&amp;lt;NITER; iter++) {
            mkl::blas::axpy(q, N, alpha, buf_a, 1, buf_b, 1);
        }
    }
    catch(cl::sycl::exception const&amp;amp; e) {
        std::cout &amp;lt;&amp;lt; "\t\tCaught synchronous SYCL exception during AXPY:\n"
          &amp;lt;&amp;lt; e.what() &amp;lt;&amp;lt; std::endl;
    }
  }
  auto end = std::chrono::high_resolution_clock::now();
  std::chrono::duration&amp;lt;double&amp;gt; elapsed_seconds = end-start;
  double time = elapsed_seconds.count();
  double GBs=double(3*N)*sizeof(T)*NITER/(time*1.e9);//2R+1W
  std::cout &amp;lt;&amp;lt;"GBs="&amp;lt;&amp;lt;GBs&amp;lt;&amp;lt;std::endl; 
}


int main(int argc, char* argv[]) {

  bench_axpy&amp;lt;float&amp;gt;(2&amp;lt;&amp;lt;27);

  return 0;
}
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I compile with :&lt;/P&gt;
&lt;P&gt;dpcpp -O3 -fsycl -std=c++17 -DMKL_ILP64 -g -DNDEBUG -lOpenCL -lsycl -lmkl_sycl -lmkl_core -lmkl_sequential -lmkl_intel_lp64 ../src/portable_main.cpp&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;and obtain :&lt;/P&gt;
&lt;P&gt;GBs=23.09 on my machine with a UHD630 (and no vram).&lt;/P&gt;
&lt;P&gt;Is it possible to improve this ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 21 Aug 2020 08:53:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Does-oneMKL-axpy-optimally-tuned-for-Intel-GPUs/m-p/1202382#M29940</guid>
      <dc:creator>LaurentPlagne</dc:creator>
      <dc:date>2020-08-21T08:53:47Z</dc:date>
    </item>
    <item>
      <title>Re:Does oneMKL axpy optimally tuned for Intel GPUs ?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Does-oneMKL-axpy-optimally-tuned-for-Intel-GPUs/m-p/1203400#M29941</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Thanks for reaching out to us!&lt;/P&gt;&lt;P&gt;Since your issue is related to oneMKL, we are moving this query to the &lt;B&gt;Intel® oneAPI Math Kernel Library &amp;amp; Intel® Math Kernel Library&lt;/B&gt; forum for a faster response.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Goutham&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 24 Aug 2020 08:38:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Does-oneMKL-axpy-optimally-tuned-for-Intel-GPUs/m-p/1203400#M29941</guid>
      <dc:creator>GouthamK_Intel</dc:creator>
      <dc:date>2020-08-24T08:38:36Z</dc:date>
    </item>
    <item>
      <title>Re: Re:Does oneMKL axpy optimally tuned for Intel GPUs ?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Does-oneMKL-axpy-optimally-tuned-for-Intel-GPUs/m-p/1204776#M29974</link>
      <description>No hints ?</description>
      <pubDate>Thu, 27 Aug 2020 19:49:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Does-oneMKL-axpy-optimally-tuned-for-Intel-GPUs/m-p/1204776#M29974</guid>
      <dc:creator>LaurentPlagne</dc:creator>
      <dc:date>2020-08-27T19:49:02Z</dc:date>
    </item>
    <item>
      <title>Re:Does oneMKL axpy optimally tuned for Intel GPUs ?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Does-oneMKL-axpy-optimally-tuned-for-Intel-GPUs/m-p/1204916#M29976</link>
      <description>&lt;P&gt;&lt;SPAN style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;You could try to check the achievable bandwidth on this particular system by running a stream benchmark ( e.g. &lt;/SPAN&gt;&lt;A href="http://uob-hpc.github.io/BabelStream/#:~:text=BabelStream&amp;amp;text=BabelStream%20is%20a%20benchmark%20used,rates%20to%2Ffrom%20capacity%20memory.&amp;amp;text=This%20benchmark%20is%20similar%20in,be%20achieved%20on%20a%20device." rel="noopener noreferrer" target="_blank" style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;BabelStream&lt;/A&gt;&lt;SPAN style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;).&lt;/SPAN&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 28 Aug 2020 08:12:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Does-oneMKL-axpy-optimally-tuned-for-Intel-GPUs/m-p/1204916#M29976</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-08-28T08:12:11Z</dc:date>
    </item>
    <item>
      <title>Re: Re:Does oneMKL axpy optimally tuned for Intel GPUs ?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Does-oneMKL-axpy-optimally-tuned-for-Intel-GPUs/m-p/1204920#M29978</link>
      <description>Hi, thank you very much for your answer ! I will post a stream benchmark as soon as I get my laptop back. &lt;BR /&gt;&lt;BR /&gt;I suspect that in this case this kernel actually exhausts the RAM bandwidth.&lt;BR /&gt;&lt;BR /&gt;My question was more about the optimality of this kernel for performing axpy on every Intel GPUs (including GPUs with VRAM).&lt;BR /&gt;&lt;BR /&gt;Thank you again.</description>
      <pubDate>Fri, 28 Aug 2020 08:30:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Does-oneMKL-axpy-optimally-tuned-for-Intel-GPUs/m-p/1204920#M29978</guid>
      <dc:creator>LaurentPlagne</dc:creator>
      <dc:date>2020-08-28T08:30:18Z</dc:date>
    </item>
    <item>
      <title>Re:Does oneMKL axpy optimally tuned for Intel GPUs ?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Does-oneMKL-axpy-optimally-tuned-for-Intel-GPUs/m-p/1204925#M29979</link>
      <description>&lt;P&gt;As we have the Beta version of oneMKL at this moment, therefore it is too earlier to speak about the “optimality of this kernel for performing axpy on every Intel GPUs…”. I think we could get back to this perf query after release timeframe.&amp;nbsp;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 28 Aug 2020 08:44:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Does-oneMKL-axpy-optimally-tuned-for-Intel-GPUs/m-p/1204925#M29979</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2020-08-28T08:44:00Z</dc:date>
    </item>
    <item>
      <title>Re: Re:Does oneMKL axpy optimally tuned for Intel GPUs ?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Does-oneMKL-axpy-optimally-tuned-for-Intel-GPUs/m-p/1204951#M29980</link>
      <description>Fair enough. Thank you again.</description>
      <pubDate>Fri, 28 Aug 2020 11:06:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Does-oneMKL-axpy-optimally-tuned-for-Intel-GPUs/m-p/1204951#M29980</guid>
      <dc:creator>LaurentPlagne</dc:creator>
      <dc:date>2020-08-28T11:06:15Z</dc:date>
    </item>
  </channel>
</rss>

