<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re:Vtune question for memory bound problem on GPU in Analyzers</title>
    <link>https://community.intel.com/t5/Analyzers/Vtune-question-for-memory-bound-problem-on-GPU/m-p/1201528#M19301</link>
    <description>&lt;P&gt;Hi Plagne,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;We are checking on this with our SME, will get back to you soon.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Adweidh&lt;/P&gt;&lt;BR /&gt;</description>
    <pubDate>Wed, 19 Aug 2020 06:33:02 GMT</pubDate>
    <dc:creator>Adweidh_Intel</dc:creator>
    <dc:date>2020-08-19T06:33:02Z</dc:date>
    <item>
      <title>Vtune question for memory bound problem on GPU</title>
      <link>https://community.intel.com/t5/Analyzers/Vtune-question-for-memory-bound-problem-on-GPU/m-p/1200788#M19296</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;I wonder about the vtune diagnostic&amp;nbsp; for memory bound problem on GPU.&lt;/P&gt;
&lt;P&gt;I measure the observed bandwidth of a vector kernel (MemoryBoundKernel.hpp) for large (2&amp;gt;&amp;gt;27) vector of floats a and b :&lt;/P&gt;
&lt;LI-CODE lang="cpp"&gt;                   h.parallel_for(global_range, [=](id&amp;lt;1&amp;gt; i) {
                        acc_b[i]+=acc_a[i];
                        });
                    });  // end submit&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I obtain 19 GB/s on my laptop (9300H with UHD630) which I suspect to be close to the maximal bandwidth on this machine.&lt;/P&gt;
&lt;P&gt;What I found surprising is that the vtune GPU analysis emphasis (in red) on EU occupancy and does not (not in red) emphasis on the RAM bandwidth saturation.&lt;/P&gt;
&lt;P&gt;Do I miss something obvious ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 17 Aug 2020 08:26:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Vtune-question-for-memory-bound-problem-on-GPU/m-p/1200788#M19296</guid>
      <dc:creator>LaurentPlagne</dc:creator>
      <dc:date>2020-08-17T08:26:16Z</dc:date>
    </item>
    <item>
      <title>Re: Vtune question for memory bound problem on GPU</title>
      <link>https://community.intel.com/t5/Analyzers/Vtune-question-for-memory-bound-problem-on-GPU/m-p/1201148#M19297</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;Thanks for reaching out to us!&lt;/P&gt;
&lt;P&gt;As your query is related to Vtune, we are redirecting your post to the &lt;A href="https://community.intel.com/t5/Analyzers-Intel-VTune-Profiler/bd-p/analyzers" target="_self"&gt;Vtune forum&lt;/A&gt; so that Vtune experts can guide you better.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks &amp;amp; Regards&lt;/P&gt;
&lt;P&gt;Goutham&lt;/P&gt;</description>
      <pubDate>Tue, 18 Aug 2020 05:30:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Vtune-question-for-memory-bound-problem-on-GPU/m-p/1201148#M19297</guid>
      <dc:creator>GouthamK_Intel</dc:creator>
      <dc:date>2020-08-18T05:30:59Z</dc:date>
    </item>
    <item>
      <title>Re: Vtune question for memory bound problem on GPU</title>
      <link>https://community.intel.com/t5/Analyzers/Vtune-question-for-memory-bound-problem-on-GPU/m-p/1201518#M19299</link>
      <description>&lt;P&gt;Self answer : I have replaced my kernel (20 GB/s)&lt;/P&gt;
&lt;LI-CODE lang="cpp"&gt; q.submit([&amp;amp;](auto &amp;amp;h) {// Submit command group for execution
    auto acc_a = buf_a.template get_access&amp;lt;access::mode::read&amp;gt;(h);// Create accessors
    auto acc_b = buf_b.template get_access&amp;lt;access::mode::write&amp;gt;(h);

    auto global_range = range&amp;lt;1&amp;gt;(vsize);// Define local and global range

    h.parallel_for(global_range, [=](id&amp;lt;1&amp;gt; i) {
        acc_b[i]+=alpha*acc_a[i];
        });
    });  // end submit&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;by oneAPI MKL axpy (26 GB/s)&lt;/P&gt;
&lt;LI-CODE lang="cpp"&gt; mkl::blas::axpy(q, vsize, alpha, buf_a, 1, buf_b, 1);&lt;/LI-CODE&gt;
&lt;P&gt;and now vtune correctly emphasizes (in red) the DRAM bandwidth bound (83.5 %).&lt;/P&gt;
&lt;P&gt;Although I don't know how my kernel could be enhanced, Vtune correctly identifies that there was room for improvement.&lt;/P&gt;
&lt;P&gt;BTW, what does exactly mean the figure 83.5% ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="saxpy_vtune.png" style="width: 999px;"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/11874iA03179DFFF5E9357/image-size/large/is-moderation-mode/true?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="saxpy_vtune.png" alt="saxpy_vtune.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Aug 2020 06:10:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Vtune-question-for-memory-bound-problem-on-GPU/m-p/1201518#M19299</guid>
      <dc:creator>Plagne__Laurent</dc:creator>
      <dc:date>2020-08-19T06:10:04Z</dc:date>
    </item>
    <item>
      <title>Re: Vtune question for memory bound problem on GPU</title>
      <link>https://community.intel.com/t5/Analyzers/Vtune-question-for-memory-bound-problem-on-GPU/m-p/1201527#M19300</link>
      <description>&lt;P&gt;The metric description is accurately described in the doc (except on how the default threshold values (low-medium-high) are computed.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/reference/cpu-metrics-reference/memory-bound/dram-bound/dram-bandwidth-bound.html" target="_blank"&gt;https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/reference/cpu-metrics-reference/memory-bound/dram-bound/dram-bandwidth-bound.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Aug 2020 06:27:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Vtune-question-for-memory-bound-problem-on-GPU/m-p/1201527#M19300</guid>
      <dc:creator>Plagne__Laurent</dc:creator>
      <dc:date>2020-08-19T06:27:19Z</dc:date>
    </item>
    <item>
      <title>Re:Vtune question for memory bound problem on GPU</title>
      <link>https://community.intel.com/t5/Analyzers/Vtune-question-for-memory-bound-problem-on-GPU/m-p/1201528#M19301</link>
      <description>&lt;P&gt;Hi Plagne,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;We are checking on this with our SME, will get back to you soon.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Adweidh&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 19 Aug 2020 06:33:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Vtune-question-for-memory-bound-problem-on-GPU/m-p/1201528#M19301</guid>
      <dc:creator>Adweidh_Intel</dc:creator>
      <dc:date>2020-08-19T06:33:02Z</dc:date>
    </item>
    <item>
      <title>Re: Vtune question for memory bound problem on GPU</title>
      <link>https://community.intel.com/t5/Analyzers/Vtune-question-for-memory-bound-problem-on-GPU/m-p/1201732#M19303</link>
      <description>&lt;P&gt;The low medium and high thresholds are just the default values. You can change these by moving the sliders at the bottom of the graph. The defaults are evenly distributed.&lt;/P&gt;</description>
      <pubDate>Wed, 19 Aug 2020 16:44:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Vtune-question-for-memory-bound-problem-on-GPU/m-p/1201732#M19303</guid>
      <dc:creator>Kevin_O_Intel1</dc:creator>
      <dc:date>2020-08-19T16:44:45Z</dc:date>
    </item>
    <item>
      <title>Re: Vtune question for memory bound problem on GPU</title>
      <link>https://community.intel.com/t5/Analyzers/Vtune-question-for-memory-bound-problem-on-GPU/m-p/1201977#M19304</link>
      <description>&lt;P&gt;Thank you !&lt;/P&gt;
&lt;P&gt;BTW do you know how the max system GPU bandwidth is evaluated ?&lt;/P&gt;
&lt;P&gt;I use the GPU version of oneMKL saxpy (on 2&amp;lt;&amp;lt;27 sized arrays) and obtain 26 GB/s on my laptop (repeated 100 times to eliminate the device/host communication) while vtune put the default max gpu bandwidth to 35 GB/s.&lt;/P&gt;
&lt;P&gt;Is is OK to assume that oneMKL saxpy should saturate the available bandwidth ?&lt;/P&gt;</description>
      <pubDate>Thu, 20 Aug 2020 06:51:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Vtune-question-for-memory-bound-problem-on-GPU/m-p/1201977#M19304</guid>
      <dc:creator>LaurentPlagne</dc:creator>
      <dc:date>2020-08-20T06:51:18Z</dc:date>
    </item>
  </channel>
</rss>

