<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic OpenCL stall on Apollo Lake GPU in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/OpenCL-stall-on-Apollo-Lake-GPU/m-p/1072376#M4476</link>
    <description>&lt;DIV&gt;&lt;EM&gt;Summary&lt;/EM&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;When I run my app and select the GPU OpenCL device, the feeder thread&amp;nbsp;&lt;/SPAN&gt;&lt;I style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;stalls inside a&amp;nbsp;blocking call to clEnqueueMapBuffer().&amp;nbsp;&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;&lt;I style="font-size: 1em;"&gt;Preamble&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV&gt;Build: Yocto from the Apollo Lake BSP release&amp;nbsp;&lt;I&gt;gold,&amp;nbsp;&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV&gt;Hardware: Oxbow Hill Rev B CRB with Intel Atom E3950 and 4GB DDR3 RAM (one SODIMM)&lt;/DIV&gt;

&lt;DIV&gt;Build: core-image-sato-sdk&lt;/DIV&gt;

&lt;DIV&gt;Installed on the onboard eMMC.&lt;/DIV&gt;

&lt;DIV&gt;OpenCL: installed user space drivers from SRB4&amp;nbsp;&lt;SPAN style="text-decoration: underline; font-size: 14.6667px; font-family: arial; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;A data-saferedirecturl="https://www.google.com/url?hl=en-GB&amp;amp;q=https://software.intel.com/file/533571/download&amp;amp;source=gmail&amp;amp;ust=1484739547577000&amp;amp;usg=AFQjCNGPgfR-XUQWgjc7M4VPAAlqqAXChQ" href="https://software.intel.com/file/533571/download" style="color: rgb(17, 85, 204);" target="_blank"&gt;&lt;/A&gt;&lt;A href="https://software.intel" target="_blank"&gt;https://software.intel&lt;/A&gt;.&lt;WBR /&gt;com/file/533571/download&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;I'm currently evaluating the Apollo Lake platform as a candidate to run our embedded application. We already have this application running on less powerful ARM based Linux systems with Mali GPU using OpenCL 1.2. We're now evaluating the E3950 as a faster alternative. To evaluate the application I need OpenCL 1.2 or later.&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;To verify the OpenCL installation I have built and run the Intel demo apps: CapsBasic and Bitonic Sort. CapsBasic sees two devices: CPU and GPU and Bitonic sort can run its kernels correctly on both the CPU and the GPU.&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;
	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;&amp;nbsp;&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;&lt;I&gt;The issue&lt;/I&gt;&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;Simply put, the application has&amp;nbsp;&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;
		&lt;UL&gt;
			&lt;LI style="margin-left: 15px;"&gt;thread 1 (feeder): has a loop that feeds data into OpenCL and queues kernels&lt;/LI&gt;
			&lt;LI style="margin-left: 15px;"&gt;thread 2 (consumer): waits for results and reads output data.&amp;nbsp;&lt;/LI&gt;
			&lt;LI style="margin-left: 15px;"&gt;an OpenCL Host command queue with out-of-order execution enabled&lt;/LI&gt;
		&lt;/UL&gt;
		When I run my app and select the GPU OpenCL device, the feeder thread&amp;nbsp;&lt;I&gt;stalls inside a&amp;nbsp;blocking call to clEnqueueMapBuffer().&amp;nbsp;&lt;/I&gt;At this point only one thing has been queued on the command queue: a buffer unmap command for a different buffer. This unmap is waiting for an OpenCL event that will indicate data ready to be processed.&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;&amp;nbsp;&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;When I run my app and select the CPU OpenCL device, it works perfectly.&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;&amp;nbsp;&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;Does anyone have any ideas on&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;
		&lt;OL&gt;
			&lt;LI style="margin-left: 15px;"&gt;what might be causing this?&lt;/LI&gt;
			&lt;LI style="margin-left: 15px;"&gt;how to debug this on the Yocto platform?&lt;/LI&gt;
		&lt;/OL&gt;
	&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;I'm now working on a short reproducer that I can publish here.&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;&amp;nbsp;&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;Thanks,&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;&amp;nbsp;&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;Tony&lt;/DIV&gt;
&lt;/DIV&gt;</description>
    <pubDate>Tue, 17 Jan 2017 11:45:30 GMT</pubDate>
    <dc:creator>tony_w_</dc:creator>
    <dc:date>2017-01-17T11:45:30Z</dc:date>
    <item>
      <title>OpenCL stall on Apollo Lake GPU</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/OpenCL-stall-on-Apollo-Lake-GPU/m-p/1072376#M4476</link>
      <description>&lt;DIV&gt;&lt;EM&gt;Summary&lt;/EM&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;When I run my app and select the GPU OpenCL device, the feeder thread&amp;nbsp;&lt;/SPAN&gt;&lt;I style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;stalls inside a&amp;nbsp;blocking call to clEnqueueMapBuffer().&amp;nbsp;&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;&lt;I style="font-size: 1em;"&gt;Preamble&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV&gt;Build: Yocto from the Apollo Lake BSP release&amp;nbsp;&lt;I&gt;gold,&amp;nbsp;&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV&gt;Hardware: Oxbow Hill Rev B CRB with Intel Atom E3950 and 4GB DDR3 RAM (one SODIMM)&lt;/DIV&gt;

&lt;DIV&gt;Build: core-image-sato-sdk&lt;/DIV&gt;

&lt;DIV&gt;Installed on the onboard eMMC.&lt;/DIV&gt;

&lt;DIV&gt;OpenCL: installed user space drivers from SRB4&amp;nbsp;&lt;SPAN style="text-decoration: underline; font-size: 14.6667px; font-family: arial; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;"&gt;&lt;A data-saferedirecturl="https://www.google.com/url?hl=en-GB&amp;amp;q=https://software.intel.com/file/533571/download&amp;amp;source=gmail&amp;amp;ust=1484739547577000&amp;amp;usg=AFQjCNGPgfR-XUQWgjc7M4VPAAlqqAXChQ" href="https://software.intel.com/file/533571/download" style="color: rgb(17, 85, 204);" target="_blank"&gt;&lt;/A&gt;&lt;A href="https://software.intel" target="_blank"&gt;https://software.intel&lt;/A&gt;.&lt;WBR /&gt;com/file/533571/download&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;I'm currently evaluating the Apollo Lake platform as a candidate to run our embedded application. We already have this application running on less powerful ARM based Linux systems with Mali GPU using OpenCL 1.2. We're now evaluating the E3950 as a faster alternative. To evaluate the application I need OpenCL 1.2 or later.&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;To verify the OpenCL installation I have built and run the Intel demo apps: CapsBasic and Bitonic Sort. CapsBasic sees two devices: CPU and GPU and Bitonic sort can run its kernels correctly on both the CPU and the GPU.&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;
	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;&amp;nbsp;&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;&lt;I&gt;The issue&lt;/I&gt;&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;Simply put, the application has&amp;nbsp;&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;
		&lt;UL&gt;
			&lt;LI style="margin-left: 15px;"&gt;thread 1 (feeder): has a loop that feeds data into OpenCL and queues kernels&lt;/LI&gt;
			&lt;LI style="margin-left: 15px;"&gt;thread 2 (consumer): waits for results and reads output data.&amp;nbsp;&lt;/LI&gt;
			&lt;LI style="margin-left: 15px;"&gt;an OpenCL Host command queue with out-of-order execution enabled&lt;/LI&gt;
		&lt;/UL&gt;
		When I run my app and select the GPU OpenCL device, the feeder thread&amp;nbsp;&lt;I&gt;stalls inside a&amp;nbsp;blocking call to clEnqueueMapBuffer().&amp;nbsp;&lt;/I&gt;At this point only one thing has been queued on the command queue: a buffer unmap command for a different buffer. This unmap is waiting for an OpenCL event that will indicate data ready to be processed.&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;&amp;nbsp;&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;When I run my app and select the CPU OpenCL device, it works perfectly.&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;&amp;nbsp;&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;Does anyone have any ideas on&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;
		&lt;OL&gt;
			&lt;LI style="margin-left: 15px;"&gt;what might be causing this?&lt;/LI&gt;
			&lt;LI style="margin-left: 15px;"&gt;how to debug this on the Yocto platform?&lt;/LI&gt;
		&lt;/OL&gt;
	&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;I'm now working on a short reproducer that I can publish here.&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;&amp;nbsp;&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;Thanks,&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;&amp;nbsp;&lt;/DIV&gt;

	&lt;DIV style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.8px;"&gt;Tony&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Tue, 17 Jan 2017 11:45:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/OpenCL-stall-on-Apollo-Lake-GPU/m-p/1072376#M4476</guid>
      <dc:creator>tony_w_</dc:creator>
      <dc:date>2017-01-17T11:45:30Z</dc:date>
    </item>
    <item>
      <title>OK, I have attached</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/OpenCL-stall-on-Apollo-Lake-GPU/m-p/1072377#M4477</link>
      <description>&lt;P&gt;I have attached a reproducer for this issue and the text output it produces. &amp;nbsp;Attached: source code, output text from the program&lt;SPAN style="font-size: 1em;"&gt;. Compiled with &lt;/SPAN&gt;&lt;EM&gt;&lt;SPAN style="font-size: 1em;"&gt;gcc gpu_issue.c -o gpu_issue -L ./opt/intel/opencl -LOpenCL&lt;/SPAN&gt;&lt;/EM&gt;&lt;/P&gt;

&lt;P&gt;Note that there is no output after the call to map buffer 2. If I modify the code to select a CPU device then the call to map buffer 2 succeeds.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Jan 2017 11:37:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/OpenCL-stall-on-Apollo-Lake-GPU/m-p/1072377#M4477</guid>
      <dc:creator>tony_w_</dc:creator>
      <dc:date>2017-01-18T11:37:22Z</dc:date>
    </item>
    <item>
      <title>Thank you for your report, I</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/OpenCL-stall-on-Apollo-Lake-GPU/m-p/1072378#M4478</link>
      <description>&lt;P&gt;Thank you for your report, I can confirm this is a GPU driver problem.&lt;/P&gt;

&lt;P&gt;We are looking into possible solutions for it, so it may be difficult to provide timeline for the fix at the moment.&lt;/P&gt;

&lt;P&gt;In the meantime if you could provide more information about what do you want to accomplish, then I may be able to provide another solution for your use case.&lt;/P&gt;</description>
      <pubDate>Thu, 19 Jan 2017 11:16:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/OpenCL-stall-on-Apollo-Lake-GPU/m-p/1072378#M4478</guid>
      <dc:creator>Michal_M_Intel</dc:creator>
      <dc:date>2017-01-19T11:16:13Z</dc:date>
    </item>
    <item>
      <title>Thanks for the information</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/OpenCL-stall-on-Apollo-Lake-GPU/m-p/1072379#M4479</link>
      <description>&lt;P&gt;Thanks for the information and the offer to help us find a way to work around the issue. Below is an outline of the processing constraints we need to satisfy.&lt;/P&gt;

&lt;P&gt;Our system processes a real-time data stream on a very short time cycle (in the ms region) so low latency processing is as important as raw speed. To allow for varying processing latency we use multiple input and output buffers in a cyclic fashion. Also, on the OpenCL implementation used on another platform (ARM Mali) we found we could reduced latency by&amp;nbsp;&lt;SPAN style="font-size: 13.008px;"&gt;queueing tasks ahead and&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;using the out-of-order queueing feature.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Would you be able to give us more information about the issue and what we must avoid doing?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Jan 2017 12:07:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/OpenCL-stall-on-Apollo-Lake-GPU/m-p/1072379#M4479</guid>
      <dc:creator>tony_w_</dc:creator>
      <dc:date>2017-01-19T12:07:45Z</dc:date>
    </item>
    <item>
      <title>What is happening in the code</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/OpenCL-stall-on-Apollo-Lake-GPU/m-p/1072380#M4480</link>
      <description>&lt;P&gt;What is happening in the code is that MapBuffer is called with blocking_flag set to True on an Out Of Order Queue.&lt;/P&gt;

&lt;P&gt;There are no input events, so for the driver it means, map this buffer for me now, even if it may be in use by GPU, please confirm that this is expected.&lt;/P&gt;

&lt;P&gt;If you want such access to buffer storage and synchronization is not needed, then you may have another out of order queue on which this MapBuffer operation will actually happen. Currently driver improperly waits for the blocked unMap operation to complete prior to servicing MapBuffer call, this wait shouldn't be present in out of order queue.&lt;/P&gt;

&lt;P&gt;If you want to actually synchronize on the previous unMap call, then code should use events.&lt;/P&gt;</description>
      <pubDate>Thu, 19 Jan 2017 14:15:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/OpenCL-stall-on-Apollo-Lake-GPU/m-p/1072380#M4480</guid>
      <dc:creator>Michal_M_Intel</dc:creator>
      <dc:date>2017-01-19T14:15:15Z</dc:date>
    </item>
    <item>
      <title>Thanks Michal. Yes, the map</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/OpenCL-stall-on-Apollo-Lake-GPU/m-p/1072381#M4481</link>
      <description>&lt;P&gt;Thanks Michal. Yes, the &lt;EM&gt;map now&lt;/EM&gt;&amp;nbsp;is intended. Now that I understand what the issue is, I have been able reorder a couple of things to avoid this problem, so I have our application running on the GPU. Unfortunately, not fast enough though, but I'll open a new topic to ask for help on that.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Jan 2017 17:09:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/OpenCL-stall-on-Apollo-Lake-GPU/m-p/1072381#M4481</guid>
      <dc:creator>tony_w_</dc:creator>
      <dc:date>2017-01-19T17:09:01Z</dc:date>
    </item>
  </channel>
</rss>

