<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re:oneapi gpu quick sort performance issue. in Intel® oneAPI DPC++/C++ Compiler</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/oneapi-gpu-quick-sort-performance-issue/m-p/1319348#M1612</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Reminder:&lt;/P&gt;&lt;P&gt;Could you please provide the above-mentioned details so that we can work on it from our end?&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Vidya.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
    <pubDate>Tue, 05 Oct 2021 13:23:44 GMT</pubDate>
    <dc:creator>VidyalathaB_Intel</dc:creator>
    <dc:date>2021-10-05T13:23:44Z</dc:date>
    <item>
      <title>oneapi gpu quick sort performance issue.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/oneapi-gpu-quick-sort-performance-issue/m-p/1315158#M1560</link>
      <description>&lt;P&gt;Hi all:&lt;/P&gt;
&lt;P&gt;I tried the sycl GPU sort code as from url:&lt;/P&gt;
&lt;P&gt;&lt;A href="https://techdecoded.intel.io/resources/gpu-quicksort/#gs.bi6fkf" target="_blank"&gt;https://techdecoded.intel.io/resources/gpu-quicksort/#gs.bi6fkf&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;build under oneapi 2020.2 release,&amp;nbsp; &amp;nbsp;but the result shows oneapi dpc++ compiler 's performance have a huge gap compare to opencl 1.2.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;testing hardware was :&amp;nbsp; i7 11700K, with 512x512 array.&lt;/P&gt;
&lt;P&gt;opencl 1.2 take 4-5 ms sort this array. but oneapi sycl take 6-7 ms.&amp;nbsp; that's almost 40% overhead...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I highly doubt that oneapi dpc++ compiler have some performance issue , b/c different software stack for GPU&amp;nbsp; should NOT have such big perf gap.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;the source code was just in above link and it's an intel official samples.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Anybody can help explain why and how to make sycl hav equal perf as opencl1.2?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks ahead.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Sep 2021 18:31:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/oneapi-gpu-quick-sort-performance-issue/m-p/1315158#M1560</guid>
      <dc:creator>JiniusT</dc:creator>
      <dc:date>2021-09-16T18:31:40Z</dc:date>
    </item>
    <item>
      <title>Re:oneapi gpu quick sort performance issue.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/oneapi-gpu-quick-sort-performance-issue/m-p/1315369#M1561</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for reaching out to us.&lt;/P&gt;&lt;P&gt;&amp;gt;&amp;gt;&lt;I&gt;build under oneapi 2020.2 release&lt;/I&gt;&lt;/P&gt;&lt;P&gt;Could you please try the latest version of oneapi (2021.3.0) DPCPP compiler and check if the issue stills persist?&lt;/P&gt;&lt;P&gt;Below is the link to download the latest version of oneAPI Basetool kit (you can get DPCPP compiler from the base toolkit):&lt;/P&gt;&lt;P&gt;&lt;A href="https://software.intel.com/content/www/us/en/develop/tools/oneapi/base-toolkit/download.html" target="_blank"&gt;https://software.intel.com/content/www/us/en/develop/tools/oneapi/base-toolkit/download.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Vidya.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 17 Sep 2021 12:18:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/oneapi-gpu-quick-sort-performance-issue/m-p/1315369#M1561</guid>
      <dc:creator>VidyalathaB_Intel</dc:creator>
      <dc:date>2021-09-17T12:18:54Z</dc:date>
    </item>
    <item>
      <title>Re: oneapi gpu quick sort performance issue.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/oneapi-gpu-quick-sort-performance-issue/m-p/1315380#M1562</link>
      <description>&lt;P&gt;Please state if this is the 1st/only sort time or if this is 2nd (and later) sort time(s). Note, the 1st time contains the JIT, resource allocation (and GPU memory allocation).&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Fri, 17 Sep 2021 13:04:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/oneapi-gpu-quick-sort-performance-issue/m-p/1315380#M1562</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2021-09-17T13:04:43Z</dc:date>
    </item>
    <item>
      <title>Re: oneapi gpu quick sort performance issue.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/oneapi-gpu-quick-sort-performance-issue/m-p/1315434#M1563</link>
      <description>&lt;P&gt;the original intel demo for gpu sort already well considered the oneapi jit and opencl precompile for kernel.&amp;nbsp; both the compile time was NOT included into the perf bench.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 17 Sep 2021 16:30:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/oneapi-gpu-quick-sort-performance-issue/m-p/1315434#M1563</guid>
      <dc:creator>JiniusT</dc:creator>
      <dc:date>2021-09-17T16:30:41Z</dc:date>
    </item>
    <item>
      <title>Re:oneapi gpu quick sort performance issue.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/oneapi-gpu-quick-sort-performance-issue/m-p/1316646#M1575</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;We are looking into this issue. We will get back to you soon.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Vidya.&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 23 Sep 2021 06:34:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/oneapi-gpu-quick-sort-performance-issue/m-p/1316646#M1575</guid>
      <dc:creator>VidyalathaB_Intel</dc:creator>
      <dc:date>2021-09-23T06:34:04Z</dc:date>
    </item>
    <item>
      <title>Re: Re:oneapi gpu quick sort performance issue.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/oneapi-gpu-quick-sort-performance-issue/m-p/1317293#M1583</link>
      <description>&lt;P&gt;Finally I have some time to test the issue with latest oneapi 2021.3 toolkit.&lt;/P&gt;
&lt;P&gt;The performance result is the same.&amp;nbsp; &amp;nbsp;dpcpp still very slow compare to opencl1.2.&lt;/P&gt;
&lt;P&gt;After dig deeper into the issue,&amp;nbsp; I feel it's the memory copy issue:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1.) in opencl1.2 intel i915 driver , it's will implements the zero memcpy between cpu and igpu. so perf no penalty.&lt;/P&gt;
&lt;P&gt;2.).in dpcpp stack,&amp;nbsp; the sycl syntax of buffer won't trigger the zero memcpy buffer some how, and&amp;nbsp; even the opencl and sycl syntax (functionality ) almost equivalent,&amp;nbsp; but dpcpp with sycl stack just suffer the pain from memory move between cpu and gpu. I don't know the real reason without the deep knowledge yet.&lt;/P&gt;
&lt;P&gt;3.) if switch from sycl buffer into dpcpp 's USM,&amp;nbsp; performance was much better , but still can't match opencl1.2 i915 stack yet.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please help this issue, b/c it's so critial for oneapi stack,&amp;nbsp; if perf have huge gap between opencl1.2 and oneapi,&amp;nbsp; developer lost motivation to migrate to this new api stack and SORTING is so important for almost everything.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;BTW, why I seeking a solution here for a onepai based gpu sorting , simple b/c it's not available in oneapi.&amp;nbsp; There are no cuda based Thrust like framework for oneapi yet, CUB migration still a dream.&amp;nbsp; and&amp;nbsp; TBB won't support soring on GPU.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Really appreciate If anybody can show me some light on how to sorting with oneapi on GPU. ( maybe there was a decent solution already somewhere.).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 26 Sep 2021 16:44:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/oneapi-gpu-quick-sort-performance-issue/m-p/1317293#M1583</guid>
      <dc:creator>JiniusT</dc:creator>
      <dc:date>2021-09-26T16:44:52Z</dc:date>
    </item>
    <item>
      <title>Re: oneapi gpu quick sort performance issue.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/oneapi-gpu-quick-sort-performance-issue/m-p/1317645#M1585</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Could you please provide us a sample reproducer for both opencl &amp;amp; sycl (USM &amp;amp; buffer models) versions and steps to reproduce the issue that you have followed to obtain the results so that we can work on it from our end?&lt;/P&gt;
&lt;P&gt;Also please provide the following details&lt;/P&gt;
&lt;P&gt;1.output of:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;sycl-ls&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;clinfo&lt;/P&gt;
&lt;P&gt;2.&amp;nbsp; Hardware details&amp;nbsp;&lt;/P&gt;
&lt;P&gt;3. Are you using OpenCL runtime or level zero as backend ?&lt;/P&gt;
&lt;P&gt;You can also use sorting algorithms from oneDPL. Please refer to the below link for more details.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://software.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-library-guide/top/extension-api.html" target="_blank"&gt;https://software.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-library-guide/top/extension-api.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;Vidya.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 30 Sep 2021 14:44:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/oneapi-gpu-quick-sort-performance-issue/m-p/1317645#M1585</guid>
      <dc:creator>VidyalathaB_Intel</dc:creator>
      <dc:date>2021-09-30T14:44:15Z</dc:date>
    </item>
    <item>
      <title>Re:oneapi gpu quick sort performance issue.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/oneapi-gpu-quick-sort-performance-issue/m-p/1319348#M1612</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Reminder:&lt;/P&gt;&lt;P&gt;Could you please provide the above-mentioned details so that we can work on it from our end?&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Vidya.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 05 Oct 2021 13:23:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/oneapi-gpu-quick-sort-performance-issue/m-p/1319348#M1612</guid>
      <dc:creator>VidyalathaB_Intel</dc:creator>
      <dc:date>2021-10-05T13:23:44Z</dc:date>
    </item>
    <item>
      <title>Re:oneapi gpu quick sort performance issue.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/oneapi-gpu-quick-sort-performance-issue/m-p/1321266#M1625</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;As we have not heard back from you, we are closing this case for now. Please post a new question if you need any additional information from Intel.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Vidya.&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 12 Oct 2021 06:45:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/oneapi-gpu-quick-sort-performance-issue/m-p/1321266#M1625</guid>
      <dc:creator>VidyalathaB_Intel</dc:creator>
      <dc:date>2021-10-12T06:45:50Z</dc:date>
    </item>
  </channel>
</rss>

