<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Well, things look even more in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/Synching-on-blocking-buffer-reads-and-on-clFinis-gives/m-p/963271#M2248</link>
    <description>&lt;P&gt;Well, things look even more strange actually.&lt;/P&gt;
&lt;P&gt;CPU usage changes (decreases) when I put additional synching points (i.e., clFinish(cq); calls) even between kernel enqueues.&lt;/P&gt;
&lt;P&gt;So,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;clEnqueueNDRangeKernel(cq,kernel1,...);&lt;/P&gt;
&lt;P&gt;clEnqueueNDRangeKernel(cq,kernel2,...);&lt;/P&gt;
&lt;P&gt;will consume more CPU (but with less overall execution time, kernels executed on GPU of course) than&lt;/P&gt;
&lt;P&gt;clEnqueueNDRangeKernel(cq,kernel1,...);&lt;/P&gt;
&lt;P&gt;clFinish(cq);&lt;/P&gt;
&lt;P&gt;clEnqueueNDRangeKernel(cq,kernel2,...);&lt;/P&gt;
&lt;P&gt;Any comments from OpenCL runtime developing team?&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sat, 02 Nov 2013 11:21:48 GMT</pubDate>
    <dc:creator>Raistmer</dc:creator>
    <dc:date>2013-11-02T11:21:48Z</dc:date>
    <item>
      <title>Synching on blocking buffer reads and on clFinis() gives drastically different CPU load</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Synching-on-blocking-buffer-reads-and-on-clFinis-gives/m-p/963270#M2247</link>
      <description>&lt;P&gt;I develop app that requires data to be transferred back to host almost after each kernel call (some flag returned).&lt;/P&gt;
&lt;P&gt;Usuall I do processing in such way:&lt;/P&gt;
&lt;P&gt;enqueueKernel(cq,..);&lt;BR /&gt;readBuffer(cq,..true);&lt;/P&gt;
&lt;P&gt;So, &amp;nbsp;queue synched on blocking read. This works OK on AND GPUs/APUs with few % CPU load, but on Intel GPU this leads to constant 100% CPU usage (app fully use 1 CPU core constantly).&lt;/P&gt;
&lt;P&gt;When I tried such sequence:&lt;/P&gt;
&lt;P&gt;enqueueKernel(cq);&lt;BR /&gt;clFinish(cq);&lt;BR /&gt;readBuffer(cq,...,true,...);&lt;/P&gt;
&lt;P&gt;CPU load was dropped considerably. So, looks like synching on clFinish() and on blocking buffer read works quite different for Intel OpenCL runtime. Why so? Does this in agreement with OpenCL standart ? &amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 23 Oct 2013 17:41:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Synching-on-blocking-buffer-reads-and-on-clFinis-gives/m-p/963270#M2247</guid>
      <dc:creator>Raistmer</dc:creator>
      <dc:date>2013-10-23T17:41:01Z</dc:date>
    </item>
    <item>
      <title>Well, things look even more</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Synching-on-blocking-buffer-reads-and-on-clFinis-gives/m-p/963271#M2248</link>
      <description>&lt;P&gt;Well, things look even more strange actually.&lt;/P&gt;
&lt;P&gt;CPU usage changes (decreases) when I put additional synching points (i.e., clFinish(cq); calls) even between kernel enqueues.&lt;/P&gt;
&lt;P&gt;So,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;clEnqueueNDRangeKernel(cq,kernel1,...);&lt;/P&gt;
&lt;P&gt;clEnqueueNDRangeKernel(cq,kernel2,...);&lt;/P&gt;
&lt;P&gt;will consume more CPU (but with less overall execution time, kernels executed on GPU of course) than&lt;/P&gt;
&lt;P&gt;clEnqueueNDRangeKernel(cq,kernel1,...);&lt;/P&gt;
&lt;P&gt;clFinish(cq);&lt;/P&gt;
&lt;P&gt;clEnqueueNDRangeKernel(cq,kernel2,...);&lt;/P&gt;
&lt;P&gt;Any comments from OpenCL runtime developing team?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 02 Nov 2013 11:21:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Synching-on-blocking-buffer-reads-and-on-clFinis-gives/m-p/963271#M2248</guid>
      <dc:creator>Raistmer</dc:creator>
      <dc:date>2013-11-02T11:21:48Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Synching-on-blocking-buffer-reads-and-on-clFinis-gives/m-p/963272#M2249</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;Both the blocking read and the clFinish() have similar performance. The behavior is not identical but you shouldn't see too much perf difference. Is it possinble to provide a repro?&lt;/P&gt;
&lt;P&gt;Thanks,&lt;BR /&gt;Raghu&lt;/P&gt;</description>
      <pubDate>Tue, 05 Nov 2013 00:37:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Synching-on-blocking-buffer-reads-and-on-clFinis-gives/m-p/963272#M2249</guid>
      <dc:creator>Raghupathi_M_Intel</dc:creator>
      <dc:date>2013-11-05T00:37:46Z</dc:date>
    </item>
  </channel>
</rss>

