<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic neural-style/torchcl with intel-opencl-r3.0 in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/neural-style-torchcl-with-intel-opencl-r3-0/m-p/1071878#M4462</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I'm ran onto issue on "github.com/jcjohnson/neural-style" &amp;amp;&amp;amp; "torchcl" (github.com/hughperkins/distro branch distro-cl) with intel-opencl-r3.0.&lt;/P&gt;

&lt;P&gt;It runs 7 times faster than Beignet 1.1.1, but processing stopped after 90-100 iterations with error code CL_OUT_OF_HOST_MEMORY (-6), whereas Beignet work stable.&lt;/P&gt;

&lt;P&gt;On image size 500x500, computer have 32Gb of RAM, OS use ~1.5Gb, and torch use ~10Gb (~5Gb resident), but your driver returns "out of host memory". Can you explain that? In same situation Beignet use ~5Gb (~0.8Gb resident).&lt;/P&gt;

&lt;P&gt;It look like, that error dropped at same number of iterations regardless of image size (250x250 or 500x500 does not make a difference). I don't see, that memory use significantly grows across iterations.&lt;/P&gt;

&lt;P&gt;Does intel-opencl-r3.0 have its own logging system to figure out what triggers "out of memory" error? And what else I can do in this situation?&lt;/P&gt;

&lt;P&gt;P.S.: That "torchcl" is written for GPUs and don't follows your recommendations for not duplicate all buffers in memory. (https://software.intel.com/en-us/articles/getting-the-most-from-opencl-12-how-to-increase-performance-by-minimizing-buffer-copies-on-intel-processor-graphics)&lt;BR /&gt;
	And maybe it have memory/object leaks, but it somehow work with Beignet without "out of memory" errors.&lt;BR /&gt;
	So I suggest, that "intel-opencl-r3.0" may have its own issues beside of that.&lt;/P&gt;

&lt;P&gt;HW: i3-6300&lt;BR /&gt;
	OS: Ubuntu-16.04 with kernel 4.4 (tried out 4.8 with your patch for i915, nothing changed).&lt;/P&gt;</description>
    <pubDate>Wed, 02 Nov 2016 04:48:33 GMT</pubDate>
    <dc:creator>Chernov__Alexey</dc:creator>
    <dc:date>2016-11-02T04:48:33Z</dc:date>
    <item>
      <title>neural-style/torchcl with intel-opencl-r3.0</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/neural-style-torchcl-with-intel-opencl-r3-0/m-p/1071878#M4462</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I'm ran onto issue on "github.com/jcjohnson/neural-style" &amp;amp;&amp;amp; "torchcl" (github.com/hughperkins/distro branch distro-cl) with intel-opencl-r3.0.&lt;/P&gt;

&lt;P&gt;It runs 7 times faster than Beignet 1.1.1, but processing stopped after 90-100 iterations with error code CL_OUT_OF_HOST_MEMORY (-6), whereas Beignet work stable.&lt;/P&gt;

&lt;P&gt;On image size 500x500, computer have 32Gb of RAM, OS use ~1.5Gb, and torch use ~10Gb (~5Gb resident), but your driver returns "out of host memory". Can you explain that? In same situation Beignet use ~5Gb (~0.8Gb resident).&lt;/P&gt;

&lt;P&gt;It look like, that error dropped at same number of iterations regardless of image size (250x250 or 500x500 does not make a difference). I don't see, that memory use significantly grows across iterations.&lt;/P&gt;

&lt;P&gt;Does intel-opencl-r3.0 have its own logging system to figure out what triggers "out of memory" error? And what else I can do in this situation?&lt;/P&gt;

&lt;P&gt;P.S.: That "torchcl" is written for GPUs and don't follows your recommendations for not duplicate all buffers in memory. (https://software.intel.com/en-us/articles/getting-the-most-from-opencl-12-how-to-increase-performance-by-minimizing-buffer-copies-on-intel-processor-graphics)&lt;BR /&gt;
	And maybe it have memory/object leaks, but it somehow work with Beignet without "out of memory" errors.&lt;BR /&gt;
	So I suggest, that "intel-opencl-r3.0" may have its own issues beside of that.&lt;/P&gt;

&lt;P&gt;HW: i3-6300&lt;BR /&gt;
	OS: Ubuntu-16.04 with kernel 4.4 (tried out 4.8 with your patch for i915, nothing changed).&lt;/P&gt;</description>
      <pubDate>Wed, 02 Nov 2016 04:48:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/neural-style-torchcl-with-intel-opencl-r3-0/m-p/1071878#M4462</guid>
      <dc:creator>Chernov__Alexey</dc:creator>
      <dc:date>2016-11-02T04:48:33Z</dc:date>
    </item>
    <item>
      <title>Didn't said it first, I'm</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/neural-style-torchcl-with-intel-opencl-r3-0/m-p/1071879#M4463</link>
      <description>&lt;P&gt;Didn't said it at first time, I'm only use GPU driver from intel-opencl-r3.0, CPU processing is slow and consume lot of power. CPU mode work after 100 iterations without errors.&lt;/P&gt;

&lt;P&gt;&amp;gt; P.S.: That "torchcl" is written for GPUs&lt;BR /&gt;
	Meant standalone GPUs with its own RAM.&lt;/P&gt;</description>
      <pubDate>Thu, 03 Nov 2016 05:59:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/neural-style-torchcl-with-intel-opencl-r3-0/m-p/1071879#M4463</guid>
      <dc:creator>Chernov__Alexey</dc:creator>
      <dc:date>2016-11-03T05:59:00Z</dc:date>
    </item>
    <item>
      <title>Thanks for this report.   I</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/neural-style-torchcl-with-intel-opencl-r3-0/m-p/1071880#M4464</link>
      <description>&lt;P&gt;Thanks for this report.&amp;nbsp;&amp;nbsp; I think I've set up enough to replicate.&amp;nbsp; 'th neural_style.lua" completes in CPU mode but crashes before 100 iterations with the clnn backend.&amp;nbsp;&amp;nbsp; I will let you know what we find.&lt;/P&gt;</description>
      <pubDate>Fri, 04 Nov 2016 07:03:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/neural-style-torchcl-with-intel-opencl-r3-0/m-p/1071880#M4464</guid>
      <dc:creator>Jeffrey_M_Intel1</dc:creator>
      <dc:date>2016-11-04T07:03:52Z</dc:date>
    </item>
    <item>
      <title>Thanks for fast reply. There</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/neural-style-torchcl-with-intel-opencl-r3-0/m-p/1071881#M4465</link>
      <description>&lt;P&gt;Thanks for fast reply. There some details that I found:&lt;/P&gt;

&lt;P&gt;$ gdb torchcl/install/bin/luajit -ex "catch throw"&lt;/P&gt;

&lt;P&gt;(gdb) run neural_style.lua -backend clnn -gpu 0 -print_iter 10 &amp;lt;...image options...&amp;gt;&lt;/P&gt;

&lt;P&gt;Iteration 10 / 1000&lt;BR /&gt;
	Iteration 20 / 1000&lt;BR /&gt;
	..&lt;BR /&gt;
	Iteration 80 / 1000&lt;BR /&gt;
	Iteration 90 / 1000&lt;/P&gt;

&lt;P&gt;(gdb) bt 3&lt;BR /&gt;
	#0&amp;nbsp; in __cxa_throw ()&lt;BR /&gt;
	#1&amp;nbsp; in EasyCL::checkError at torchcl/opencl/cltorch/src/EasyCL/EasyCL.cpp:538&lt;BR /&gt;
	#2&amp;nbsp; in CLWrapper::copyToHost at torchcl/opencl/cltorch/src/EasyCL/CLWrapper.cpp:74&lt;/P&gt;

&lt;P&gt;torchcl/opencl/cltorch/src/EasyCL/CLWrapper.cpp:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;void CLWrapper::copyToHost() {
    if(!onDevice) {
        throw std::runtime_error("copyToHost(): not on device");
    }
//    cl-&amp;gt;finish();
    cl_event event = NULL;
    error = clEnqueueReadBuffer(*(cl-&amp;gt;queue), devicearray, CL_TRUE, 0, getElementSize() * N, getHostArray(), 0, NULL, &amp;amp;event);    
    cl-&amp;gt;checkError(error);
    cl_int err = clWaitForEvents(1, &amp;amp;event);
    clReleaseEvent(event);
    if (err != CL_SUCCESS) {
        throw std::runtime_error("wait for event on copytohost failed with " + easycl::toString(err) );
    }
    deviceDirty = false;
}&lt;/PRE&gt;

&lt;P&gt;So error come from clEnqueueReadBuffer.&lt;BR /&gt;
	When I'm changed code to this:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;void CLWrapper::copyToHost() {
    if(!onDevice) { throw std::runtime_error("copyToHost(): not on device"); }
//    cl-&amp;gt;finish();
    void *ptr = clEnqueueMapBuffer(*(cl-&amp;gt;queue), devicearray, CL_TRUE, CL_MAP_READ, 0, getElementSize() * N, 0, NULL, NULL, &amp;amp;error);
    cl-&amp;gt;checkError(error);
    ::memcpy(getHostArray(), ptr, getElementSize() * N);
    clEnqueueUnmapMemObject(*(cl-&amp;gt;queue), devicearray, ptr, 0, NULL, NULL);
    deviceDirty = false;
}&lt;/PRE&gt;

&lt;P&gt;... neural-style stop with same error after 100 iterations, but in different place:&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;torchcl/install/share/lua/5.1/optim/lbfgs.lua:152:&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;clblasSdot() failed with -6 at torchcl/opencl/cltorch/src/lib/THClBlas.cpp:186&lt;/P&gt;

&lt;P&gt;But I haven't figured out from which CL call -6 error comes this time.&lt;BR /&gt;
	clblasSdot() is from torchcl/opencl/cltorch/src/clMathLibraries/clBLAS/src/library/blas/xdot.c, can return error from many CL functions.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 04 Nov 2016 13:52:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/neural-style-torchcl-with-intel-opencl-r3-0/m-p/1071881#M4465</guid>
      <dc:creator>Chernov__Alexey</dc:creator>
      <dc:date>2016-11-04T13:52:00Z</dc:date>
    </item>
    <item>
      <title>I'm seeing events being</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/neural-style-torchcl-with-intel-opencl-r3-0/m-p/1071882#M4466</link>
      <description>&lt;P&gt;I'm seeing events being created that are never released.&amp;nbsp; After running for enough iterations this eventually results in the OUT_OF_MEMORY error.&amp;nbsp; Many of the events that are never released come from enqueuing "Sdot_kernel".&amp;nbsp; I'm tracking down where this kernel is being enqueued so I can figure out where the event should be released, but I wanted to post my findings so far.&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 10 Nov 2016 23:34:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/neural-style-torchcl-with-intel-opencl-r3-0/m-p/1071882#M4466</guid>
      <dc:creator>Ben_A_Intel</dc:creator>
      <dc:date>2016-11-10T23:34:29Z</dc:date>
    </item>
    <item>
      <title>I added my findings to the</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/neural-style-torchcl-with-intel-opencl-r3-0/m-p/1071883#M4467</link>
      <description>&lt;P&gt;I added my findings to the torch-cl issue you created (thanks!):&lt;/P&gt;

&lt;P&gt;&lt;A href="https://github.com/hughperkins/distro-cl/issues/14" target="_blank"&gt;https://github.com/hughperkins/distro-cl/issues/14&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;The fastest solution will be to fix the event leak in torch-cl, so hopefully we can make progress on this issue.&amp;nbsp; In the meantime though, we're also looking at ways to improve our event handling so it's more resilient to memory leaks in the future.&lt;/P&gt;</description>
      <pubDate>Tue, 15 Nov 2016 18:19:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/neural-style-torchcl-with-intel-opencl-r3-0/m-p/1071883#M4467</guid>
      <dc:creator>Ben_A_Intel</dc:creator>
      <dc:date>2016-11-15T18:19:12Z</dc:date>
    </item>
    <item>
      <title>Big thanks to you, I have</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/neural-style-torchcl-with-intel-opencl-r3-0/m-p/1071884#M4468</link>
      <description>&lt;P&gt;Big thanks to you, I have seen this source file, but don't noticed this event. I have pulled request with fix to hughperkins/clBLAS.&lt;/P&gt;

&lt;P&gt;Look like that OpenCL need special code for "out of events" error.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Nov 2016 06:48:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/neural-style-torchcl-with-intel-opencl-r3-0/m-p/1071884#M4468</guid>
      <dc:creator>Chernov__Alexey</dc:creator>
      <dc:date>2016-11-16T06:48:00Z</dc:date>
    </item>
  </channel>
</rss>

