<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Example of performing kernel queries in Intel® oneAPI DPC++/C++ Compiler</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1288004#M1272</link>
    <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am going through Intel official textbook: "Data Parallel C++"&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.apress.com/gp/book/9781484255735" target="_blank"&gt;https://www.apress.com/gp/book/9781484255735&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would like to tune the performance of nd_range kernel by optimize work group size. The two required paramers are (c.f. Chapter 12)&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;preferred_work_group_size
preferred_work_group_size_multiple
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But I could not figure out how to do obtain them. To quote the book itself (page 366)&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;The kernel preferred_work_group_size_multiple query can be used to choose an efficient work-group size. Please refer to Chapter 12 for more information on how to query properties of a device&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Unfortunately, chapter 12 simply brushes over the aformentioned parameters without any explanation on how to use them. Querying properties of device is not the issue here. &lt;/P&gt;
&lt;P&gt;The most I could figure out is that we need to pass a kernel object to 'parallel_for'. Do we query the kernel before or after the command submission ? Please pardon my ignorance,&amp;nbsp; but I am very confused coming from a CUDA background.&lt;/P&gt;
&lt;P&gt;I am grateful if support team can provide a simple example using the vecter addition implementation to find 'preferred_work_group_size_multiple'&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards.&lt;/P&gt;</description>
    <pubDate>Tue, 08 Jun 2021 08:03:15 GMT</pubDate>
    <dc:creator>Viet-Duc</dc:creator>
    <dc:date>2021-06-08T08:03:15Z</dc:date>
    <item>
      <title>Example of performing kernel queries</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1288004#M1272</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am going through Intel official textbook: "Data Parallel C++"&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.apress.com/gp/book/9781484255735" target="_blank"&gt;https://www.apress.com/gp/book/9781484255735&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would like to tune the performance of nd_range kernel by optimize work group size. The two required paramers are (c.f. Chapter 12)&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;preferred_work_group_size
preferred_work_group_size_multiple
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But I could not figure out how to do obtain them. To quote the book itself (page 366)&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;The kernel preferred_work_group_size_multiple query can be used to choose an efficient work-group size. Please refer to Chapter 12 for more information on how to query properties of a device&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Unfortunately, chapter 12 simply brushes over the aformentioned parameters without any explanation on how to use them. Querying properties of device is not the issue here. &lt;/P&gt;
&lt;P&gt;The most I could figure out is that we need to pass a kernel object to 'parallel_for'. Do we query the kernel before or after the command submission ? Please pardon my ignorance,&amp;nbsp; but I am very confused coming from a CUDA background.&lt;/P&gt;
&lt;P&gt;I am grateful if support team can provide a simple example using the vecter addition implementation to find 'preferred_work_group_size_multiple'&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards.&lt;/P&gt;</description>
      <pubDate>Tue, 08 Jun 2021 08:03:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1288004#M1272</guid>
      <dc:creator>Viet-Duc</dc:creator>
      <dc:date>2021-06-08T08:03:15Z</dc:date>
    </item>
    <item>
      <title>Re: Example of performing kernel queries</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1288453#M1281</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for reaching out to us.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regarding the usage of preferred_work_group_size_multiple:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In linux, we can directly get the preferred_work_group_size_multiple value by using &lt;STRONG&gt;clinfo&lt;/STRONG&gt; command.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you want to be more precise with the starting number used to choose multiples from,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;query kernel_preferred_work_group_size_multiple using the following code snippet&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;const size_t max_device_work_group_size = kernel.get_work_group_info&amp;lt;sycl::info::kernel_work_group::preferred_work_group_size_multiple&amp;gt;(device)&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;gt;&amp;gt;can provide a simple example using the vector addition implementation to find 'preferred_work_group_size_multiple'&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;We are working on it, we will get back to you soon.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;Vidya.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 14 Jun 2021 07:17:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1288453#M1281</guid>
      <dc:creator>VidyalathaB_Intel</dc:creator>
      <dc:date>2021-06-14T07:17:57Z</dc:date>
    </item>
    <item>
      <title>Re: Example of performing kernel queries</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1288598#M1283</link>
      <description>&lt;P&gt;Dear Vidya,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for suggesting clinfo. It is indeed a very useful tool.&lt;/P&gt;
&lt;P&gt;For sake of reference, the following result was obtained from gen9 queue:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;[E-2176G]&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;Max work item dimensions                        3
Max work item sizes                             8192x8192x8192
Max work group size                             8192
Preferred work group size multiple (kernel)     128
Max sub-groups per work group                   2048&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;[UHD P630]&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;Max work item dimensions                        3
Max work item sizes                             256x256x256
Max work group size                             256
Preferred work group size multiple (device)     32
Preferred work group size multiple (kernel)     32
Max sub-groups per work group                   32&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regarding the code snippet, that was also the only thing I could find while combing the internet. Still, it was not shown the context in which 'kernel' is defined. My gripe with C++ is that nothing is every straightforward.&lt;/P&gt;
&lt;P&gt;I look forward to the full example.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for your time.&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jun 2021 00:46:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1288598#M1283</guid>
      <dc:creator>Viet-Duc</dc:creator>
      <dc:date>2021-06-10T00:46:42Z</dc:date>
    </item>
    <item>
      <title>Re: Example of performing kernel queries</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1289619#M1304</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;gt;&amp;gt; &lt;I&gt;I look forward to the full example.&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;Please find the implementation of&lt;STRONG&gt; preferred_work_group_size_multiple&lt;/STRONG&gt; in the following code.&lt;/P&gt;
&lt;LI-CODE lang="cpp"&gt;#include &amp;lt;CL/sycl.hpp&amp;gt;
#include &amp;lt;iostream&amp;gt;
#include &amp;lt;array&amp;gt;
using namespace cl::sycl;
int main()
{
        const size_t szKernelData = 1024;
        std::array&amp;lt;float, szKernelData&amp;gt; kernelData;
        range&amp;lt;1&amp;gt; r(szKernelData);
        queue q{gpu_selector()};
        program p(q.get_context());
        p.build_with_source(R"CLC( kernel void sinf_test(global float* data) {
                                data[get_global_id(0)] += 1 ;
                        } )CLC", "-cl-std=CL1.2");
        auto k = p.get_kernel("sinf_test");
        auto sz = k.get_work_group_info&amp;lt;info::kernel_work_group::preferred_work_group_size_multiple&amp;gt;(q.get_device());
        std::cout &amp;lt;&amp;lt; sz &amp;lt;&amp;lt; "\n";
        return 0;
}&lt;/LI-CODE&gt;
&lt;P&gt;Command Used:&lt;/P&gt;
&lt;P&gt;dpcpp *.cpp &amp;amp;&amp;amp; SYCL_BE=PI_OPENCL ./a.out&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;Vidya.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 14 Jun 2021 12:00:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1289619#M1304</guid>
      <dc:creator>VidyalathaB_Intel</dc:creator>
      <dc:date>2021-06-14T12:00:14Z</dc:date>
    </item>
    <item>
      <title>Re: Example of performing kernel queries</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1290184#M1309</link>
      <description>&lt;P&gt;Hi, Vidya&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The code compiled and produce same results with clinfo.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Now I understood that the kernel must be compiled before passing it to parallel for.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In this sense, preferred_work_group_size_multiple is a constant carrying similar meaning as CUDA warp.&lt;/P&gt;
&lt;P&gt;Could you confirm whether my understand is correct ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I was confused as the book had implied that preferred_work_group_size_multiple was a kernel dependent variable.&lt;/P&gt;
&lt;P&gt;After running the kernel once, the aformentioned value will changes from 64 to a different value.&lt;/P&gt;
&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jun 2021 02:47:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1290184#M1309</guid>
      <dc:creator>Viet-Duc</dc:creator>
      <dc:date>2021-06-16T02:47:25Z</dc:date>
    </item>
    <item>
      <title>Re:Example of performing kernel queries</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1291073#M1321</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;gt;&amp;gt;&lt;SPAN style="font-size: 12px;"&gt; &lt;/SPAN&gt;&lt;I style="font-size: 12px; font-family: intel-clear, tahoma, Helvetica, helvetica, Arial, sans-serif;"&gt;After running the kernel once, the aformentioned value will changes from 64 to a different value.&lt;/I&gt;&lt;/P&gt;&lt;P&gt;Could you please let us know what values are you getting with different runs ?&lt;/P&gt;&lt;P&gt;&amp;gt;&amp;gt; &lt;I style="font-size: 12px; font-family: intel-clear, tahoma, Helvetica, helvetica, Arial, sans-serif;"&gt;In this sense, preferred_work_group_size_multiple is a constant carrying similar meaning as CUDA warp&lt;/I&gt;&lt;I style="font-size: 14px; font-family: intel-clear, tahoma, Helvetica, helvetica, Arial, sans-serif;"&gt;.&lt;/I&gt;&lt;I&gt;&amp;nbsp;&lt;/I&gt;&lt;/P&gt;&lt;P&gt;Regarding this we will get back to you soon.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks &amp;amp; Regards,&lt;/P&gt;&lt;P&gt;Vidya&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 18 Jun 2021 11:04:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1291073#M1321</guid>
      <dc:creator>VidyalathaB_Intel</dc:creator>
      <dc:date>2021-06-18T11:04:51Z</dc:date>
    </item>
    <item>
      <title>Re: Example of performing kernel queries</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1291491#M1323</link>
      <description>&lt;P&gt;Sorry the sentence came out wrong. I meant to explain what the book is implying. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;The result of running the code is always a constant. There two separated values for preferred_work_group_size_multiple (UHD P630)&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;Preferred work group size multiple (device)     32
Preferred work group size multiple (kernel)     32&lt;/LI-CODE&gt;
&lt;P&gt;I just want to make sure that 'kernel' one is a hardware-based constant regardless of the nature of kernel&lt;/P&gt;
&lt;P&gt;This way, we won't need to do runtime check for each kernel of different size.&amp;nbsp; &lt;/P&gt;
&lt;P&gt;Sorry for causing confusion.&lt;/P&gt;</description>
      <pubDate>Mon, 21 Jun 2021 02:13:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1291491#M1323</guid>
      <dc:creator>Viet-Duc</dc:creator>
      <dc:date>2021-06-21T02:13:21Z</dc:date>
    </item>
    <item>
      <title>Re:Example of performing kernel queries</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1297427#M1401</link>
      <description>&lt;P&gt;The preferred_work_group_size_multiple is a kernel query. &lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: &amp;quot;Courier New&amp;quot;; font-size: 11pt;"&gt;Returns a value, of which work-group size is preferred to be a multiple, for executing a kernel on a particular device. This is a performance hint. The value must be less than or equal to that returned by info::kernel_device_specific::work_group_size&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Compiler uses certain heuristics  depending on the kernel to determine this. And the number reported by clinfo "preferredworkgroupsizemultiple" might differ from this kernel query. &lt;/P&gt;&lt;P&gt;Hoe this answers your question.&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 09 Jul 2021 20:43:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1297427#M1401</guid>
      <dc:creator>Varsha_M_Intel</dc:creator>
      <dc:date>2021-07-09T20:43:24Z</dc:date>
    </item>
    <item>
      <title>Re: Example of performing kernel queries</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1298028#M1408</link>
      <description>&lt;P&gt;Thanks for your clarification.&lt;/P&gt;</description>
      <pubDate>Tue, 13 Jul 2021 05:36:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1298028#M1408</guid>
      <dc:creator>Viet-Duc</dc:creator>
      <dc:date>2021-07-13T05:36:40Z</dc:date>
    </item>
    <item>
      <title>Re:Example of performing kernel queries</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1304157#M1461</link>
      <description>&lt;P&gt;&lt;SPAN style="font-family: verdana; font-size: 12px;"&gt;Thanks for accepting our solution. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.&lt;/SPAN&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 06 Aug 2021 05:49:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Example-of-performing-kernel-queries/m-p/1304157#M1461</guid>
      <dc:creator>JyotsnaK_Intel</dc:creator>
      <dc:date>2021-08-06T05:49:59Z</dc:date>
    </item>
  </channel>
</rss>

