<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Concurrent Kernel Execution in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/Concurrent-Kernel-Execution/m-p/1027944#M3527</link>
    <description>&lt;P&gt;Can host threads execute kernels concurrently with intel sdk for opencl? &amp;nbsp;I heard that kernels(commands) from different command-queues will be executed concurrently on the device. Is that true? And, is &amp;nbsp;"Device Fission" supported on GPU with Intel opencl driver now? That may be another way to&amp;nbsp;&lt;SPAN style="color: rgb(51, 51, 51); font-family: arial; font-size: 13.3333330154419px; line-height: 13.3466672897339px; white-space: nowrap;"&gt;implement it.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="color: rgb(51, 51, 51); font-family: arial; font-size: 13.3333330154419px; line-height: 13.3466672897339px; white-space: nowrap;"&gt;I use: Intel Core i7, Intel HD Graphics 4600, Intel sdk for opencl.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;FONT color="#333333" face="arial"&gt;&lt;SPAN style="line-height: 13.3466672897339px; white-space: nowrap;"&gt;THX,&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;

&lt;P&gt;Lingzhi&lt;/P&gt;</description>
    <pubDate>Sat, 18 Oct 2014 09:12:42 GMT</pubDate>
    <dc:creator>Lingzhi_S_</dc:creator>
    <dc:date>2014-10-18T09:12:42Z</dc:date>
    <item>
      <title>Concurrent Kernel Execution</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Concurrent-Kernel-Execution/m-p/1027944#M3527</link>
      <description>&lt;P&gt;Can host threads execute kernels concurrently with intel sdk for opencl? &amp;nbsp;I heard that kernels(commands) from different command-queues will be executed concurrently on the device. Is that true? And, is &amp;nbsp;"Device Fission" supported on GPU with Intel opencl driver now? That may be another way to&amp;nbsp;&lt;SPAN style="color: rgb(51, 51, 51); font-family: arial; font-size: 13.3333330154419px; line-height: 13.3466672897339px; white-space: nowrap;"&gt;implement it.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="color: rgb(51, 51, 51); font-family: arial; font-size: 13.3333330154419px; line-height: 13.3466672897339px; white-space: nowrap;"&gt;I use: Intel Core i7, Intel HD Graphics 4600, Intel sdk for opencl.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;FONT color="#333333" face="arial"&gt;&lt;SPAN style="line-height: 13.3466672897339px; white-space: nowrap;"&gt;THX,&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;

&lt;P&gt;Lingzhi&lt;/P&gt;</description>
      <pubDate>Sat, 18 Oct 2014 09:12:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Concurrent-Kernel-Execution/m-p/1027944#M3527</guid>
      <dc:creator>Lingzhi_S_</dc:creator>
      <dc:date>2014-10-18T09:12:42Z</dc:date>
    </item>
    <item>
      <title>Lingzhi,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Concurrent-Kernel-Execution/m-p/1027945#M3528</link>
      <description>&lt;P&gt;Lingzhi,&lt;/P&gt;

&lt;P&gt;Kernels cannot be executed concurrently on the GPU device using current production drivers. The Device Fission feature is available on the OpenCL CPU device only: see &lt;A href="https://software.intel.com/en-us/articles/opencl-device-fission-for-cpu-performance"&gt;https://software.intel.com/en-us/articles/opencl-device-fission-for-cpu-performance.&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Robert&lt;/P&gt;</description>
      <pubDate>Tue, 21 Oct 2014 00:16:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Concurrent-Kernel-Execution/m-p/1027945#M3528</guid>
      <dc:creator>Robert_I_Intel</dc:creator>
      <dc:date>2014-10-21T00:16:13Z</dc:date>
    </item>
    <item>
      <title>Will support for concurrent</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Concurrent-Kernel-Execution/m-p/1027946#M3529</link>
      <description>&lt;P&gt;Will support for concurrent kernels be added to the IGP drivers at some point? &amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 27 Oct 2014 22:26:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Concurrent-Kernel-Execution/m-p/1027946#M3529</guid>
      <dc:creator>allanmac1</dc:creator>
      <dc:date>2014-10-27T22:26:42Z</dc:date>
    </item>
    <item>
      <title>allanmac,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Concurrent-Kernel-Execution/m-p/1027947#M3530</link>
      <description>&lt;P&gt;allanmac,&lt;/P&gt;

&lt;P&gt;We are collecting the requirements and use cases for the concurrent kernel execution. Please let me know what they are and I will forward it to our product team. They are hesitant to add that functionality at the moment due to lack of demand and realistic use cases.&lt;/P&gt;</description>
      <pubDate>Mon, 27 Oct 2014 22:32:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Concurrent-Kernel-Execution/m-p/1027947#M3530</guid>
      <dc:creator>Robert_I_Intel</dc:creator>
      <dc:date>2014-10-27T22:32:17Z</dc:date>
    </item>
    <item>
      <title>OK.  Here's my use case:</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Concurrent-Kernel-Execution/m-p/1027948#M3531</link>
      <description>&lt;P&gt;OK. &amp;nbsp;Here's my use case:&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;I have an advanced pipeline of kernels that are designed to run concurrently. &amp;nbsp;Inter-kernel dependencies are currently managed by the kernel launching logic and kernel-completion callbacks but at some point I may dump this work onto the OpenCL event system if it further reduces system latency.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Some of the kernels are computationally intense. &amp;nbsp;Others are not. &amp;nbsp;All run for short durations (from microseconds to at most a few milliseconds).&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;I don't care about presenting enough work to the IGP for it to reach its peak clock speed since I always have the option to make that happen by queuing up more work for the IGP.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;But I do care about latency... which is why I really want concurrent kernels.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;---&lt;/P&gt;

&lt;P&gt;That being said, I understand why the smaller IGPs probably aren't going to benefit much from concurrent kernel execution.&lt;/P&gt;

&lt;P&gt;But a double or triple-slice IGP seems like it would be a good environment for concurrent kernel execution. :)&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 27 Oct 2014 23:27:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Concurrent-Kernel-Execution/m-p/1027948#M3531</guid>
      <dc:creator>allanmac1</dc:creator>
      <dc:date>2014-10-27T23:27:41Z</dc:date>
    </item>
    <item>
      <title>allanmac,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Concurrent-Kernel-Execution/m-p/1027949#M3532</link>
      <description>&lt;P&gt;allanmac,&lt;/P&gt;

&lt;P&gt;In the short term, we have nested parallelism in OpenCL 2.0 (kernels launching other kernels), which should improve latency situation. For more on nested parallelism, see my article&amp;nbsp;https://software.intel.com/en-us/articles/gpu-quicksort-in-opencl-20-using-nested-parallelism-and-work-group-scan-functions&amp;nbsp;&lt;/P&gt;

&lt;P&gt;You can also watch short videos on nested parallelism here:&lt;/P&gt;

&lt;P style="margin-top:0in;margin-right:0in;margin-bottom:6.0pt;margin-left:0in;
line-height:13.5pt;background:#F0F0F0"&gt;&amp;nbsp;&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;&lt;A href="https://software.intel.com/en-us/videos/implementing-sierpi-ski-carpet-in-opencl-20" target="_blank"&gt;https://software.intel.com/en-us/videos/implementing-sierpi-ski-carpet-in-opencl-20&lt;/A&gt;&lt;/LI&gt;
	&lt;LI&gt;&lt;A href="https://software.intel.com/en-us/videos/gpu-quicksort-in-opencl-20" target="_blank"&gt;https://software.intel.com/en-us/videos/gpu-quicksort-in-opencl-20&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;

&lt;P style="margin-top:0in;margin-right:0in;margin-bottom:6.0pt;margin-left:0in;
line-height:13.5pt;background:#F0F0F0"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-top:0in;margin-right:0in;margin-bottom:6.0pt;margin-left:0in;
line-height:13.5pt;background:#F0F0F0"&gt;&lt;SPAN style="color: rgb(57, 57, 57); font-family: 'Lucida Console'; font-size: 10pt; line-height: 13.5pt; background-color: rgb(221, 221, 221);"&gt;I will forward your input to our product team.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 27 Oct 2014 23:34:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Concurrent-Kernel-Execution/m-p/1027949#M3532</guid>
      <dc:creator>Robert_I_Intel</dc:creator>
      <dc:date>2014-10-27T23:34:00Z</dc:date>
    </item>
  </channel>
</rss>

