<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: clReleaseMemObject just after clEnqueueTask causes segfault in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/clReleaseMemObject-just-after-clEnqueueTask-causes-segfault/m-p/209752#M2</link>
    <description>&lt;P&gt;Try compiling your kernel again another OpenCL SDK; e.g. AMD's or NVIDIA's SDK and see if your code would segfault in the same place. If it doesn't, then this is a bug in the Intel FPGA SDK. Though, I am very surprised to see that the specification claims the buffer will be freed after commands depending on it finish; the command queue information is NOT passed to the clReleaseMemObject function as an argument, so I fail to see how this function will be able to determine when the buffer is safe to delete.&lt;/P&gt;</description>
    <pubDate>Tue, 22 Jan 2019 12:32:01 GMT</pubDate>
    <dc:creator>HRZ</dc:creator>
    <dc:date>2019-01-22T12:32:01Z</dc:date>
    <item>
      <title>clReleaseMemObject just after clEnqueueTask causes segfault</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/clReleaseMemObject-just-after-clEnqueueTask-causes-segfault/m-p/209751#M1</link>
      <description>&lt;P&gt;OpenCL spec says that clReleaseMemObject() doesn't delete the specified memory object if there are queued tasks which use the object.&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/clReleaseMemObject.html"&gt;https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/clReleaseMemObject.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;gt; After the&amp;nbsp;&lt;I&gt;memobj&lt;/I&gt;&amp;nbsp;reference count becomes zero and commands queued for execution on a command-queue(s) that use&amp;nbsp;&lt;I&gt;memobj&lt;/I&gt;&amp;nbsp;have finished, the memory object is deleted.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, the following example code causes segmentation fault on my environment (emulation with Intel FPGA SDK for OpenCL 18.1).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;kernel code&lt;/LI&gt;&lt;/UL&gt;&lt;CODE&gt;__kernel void sample(__global char * restrict s)
{
        s[0] = 'H';
        s[1] = 'e';
        s[2] = 'l';
        s[3] = 'l';
        s[4] = 'o';
        s[5] = '\0';
}&lt;/CODE&gt;&lt;UL&gt;&lt;LI&gt;host code&lt;/LI&gt;&lt;/UL&gt;&lt;CODE&gt;#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;stdlib.h&amp;gt;
#include &amp;lt;string.h&amp;gt;
&amp;nbsp;
#include &amp;lt;CL/cl.h&amp;gt;
&amp;nbsp;
#define KERNEL_FILE     "sample.aocx"
#define KERNEL_NAME     "sample"
&amp;nbsp;
#define ARRAY_SIZE      (1024 * 1024)
&amp;nbsp;
static void check_status(cl_int err, const char *api)
{
        if (err == CL_SUCCESS)
                return;
        abort();
}
&amp;nbsp;
int main()
{
        FILE *fp = fopen(KERNEL_FILE, "r");
        if (fp == NULL) {
                fprintf(stderr, "Could not open file %s.\n", KERNEL_FILE);
                exit(1);
        }
        fseek(fp, 0, SEEK_END);
        long file_size = ftell(fp);
        unsigned char *binary = malloc(file_size);
        if (binary == NULL) {
                fprintf(stderr, "Could not allocate memory.\n");
                exit(1);
        }
        fseek(fp, 0, SEEK_SET);
        if (fread(binary, file_size, 1, fp) != 1) {
                fprintf(stderr, "Could not read file %s.\n", KERNEL_FILE);
                exit(1);
        }
        fclose(fp);
&amp;nbsp;
        cl_int status;
        cl_platform_id platform_id;
        cl_uint num_platforms;
        status = clGetPlatformIDs(1, &amp;amp;platform_id, &amp;amp;num_platforms);
        check_status(status, "clGetPlatformIDs");
&amp;nbsp;
        cl_device_id device_id;
        cl_uint num_devices;
        status = clGetDeviceIDs(platform_id,
                                CL_DEVICE_TYPE_ACCELERATOR,
                                1,
                                &amp;amp;device_id,
                                &amp;amp;num_devices);
        check_status(status, "clGetDeviceIDs");
&amp;nbsp;
        cl_context context = clCreateContext(NULL,
                                             1,
                                             &amp;amp;device_id,
                                             NULL,
                                             NULL,
                                             &amp;amp;status);
        check_status(status, "clCreateContext");
&amp;nbsp;
        cl_command_queue command_queue = clCreateCommandQueue(context,
                                                              device_id,
                                                              0,
                                                              &amp;amp;status);
        check_status(status, "clCreateCommandQueue");
&amp;nbsp;
        cl_int binary_status;
        size_t binary_size = file_size;
        cl_program program = clCreateProgramWithBinary(context,
                                                       1,
                                                       &amp;amp;device_id,
                                                       &amp;amp;binary_size,
                                                       (const unsigned char **)&amp;amp;binary,
                                                       &amp;amp;binary_status,
                                                       &amp;amp;status);
        check_status(status, "clCreateProgramWithBinary");
&amp;nbsp;
        cl_kernel kernel = clCreateKernel(program, KERNEL_NAME, &amp;amp;status);
        check_status(status, "clCreateKernel");
&amp;nbsp;
        cl_mem mem_a = clCreateBuffer(context,
                                      CL_MEM_READ_WRITE,
                                      ARRAY_SIZE,
                                      NULL,
                                      &amp;amp;status);
        check_status(status, "clCreateBuffer");
&amp;nbsp;
        status = clSetKernelArg(kernel, 0, sizeof(cl_mem), &amp;amp;mem_a);
        check_status(status, "clSetKernelArg");
&amp;nbsp;
        status = clEnqueueTask(command_queue, kernel, 0, NULL, NULL);
        check_status(status, "clEnqueueTask");
&amp;nbsp;
        status = clReleaseMemObject(mem_a);
        check_status(status, "clReleaseMemObject");
&amp;nbsp;
        status = clFlush(command_queue);
        check_status(status, "clFlush");
        status = clFinish(command_queue);
        check_status(status, "clFinish");
&amp;nbsp;
        status = clReleaseKernel(kernel);
        check_status(status, "clReleaseKernel");
        status = clReleaseProgram(program);
        check_status(status, "clReleaseProgram");
        status = clReleaseCommandQueue(command_queue);
        check_status(status, "clReleaseCommandQueue");
        status = clReleaseContext(context);
        check_status(status, "clReleaseContext");
        free(binary);
&amp;nbsp;
        return 0;
}&lt;/CODE&gt;&lt;P&gt;I tried valgrind, and it looks like clReleaseMemObject() deletes the memory object even if there is a running kernel which uses the memory.&lt;/P&gt;&lt;CODE&gt;==55813== Invalid write of size 1
==55813==    at 0xBC304CA: sample (sample.cl:3)
==55813==  Address 0xb8a5000 is 912 bytes inside a block of size 1,049,600 free'd
==55813==    at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==55813==    by 0x5C42D19: acl_mem_aligned_free (in /opt/intelFPGA/18.1/hld/host/linux64/lib/libalteracl.so)
==55813==    by 0x5C47307: clReleaseMemObjectIntelFPGA (in /opt/intelFPGA/18.1/hld/host/linux64/lib/libalteracl.so)
==55813==    by 0x4E3CFFE: clReleaseMemObject (in /opt/intelFPGA/18.1/hld/host/linux64/lib/libOpenCL.so.1)
==55813==    by 0x4011BE: main (main.c:96)
==55813==  Block was alloc'd at
==55813==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==55813==    by 0x5C41A87: acl_mem_aligned_malloc (in /opt/intelFPGA/18.1/hld/host/linux64/lib/libalteracl.so)
==55813==    by 0x5C49A12: clCreateBufferIntelFPGA (in /opt/intelFPGA/18.1/hld/host/linux64/lib/libalteracl.so)
==55813==    by 0x4E3CF17: clCreateBuffer (in /opt/intelFPGA/18.1/hld/host/linux64/lib/libOpenCL.so.1)
==55813==    by 0x40113E: main (main.c:83)
==55813==     
==55813== Invalid write of size 1
==55813==    at 0xBC304D2: sample (sample.cl:4)
==55813==  Address 0xb8a5001 is 913 bytes inside a block of size 1,049,600 free'd
==55813==    at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==55813==    by 0x5C42D19: acl_mem_aligned_free (in /opt/intelFPGA/18.1/hld/host/linux64/lib/libalteracl.so)
==55813==    by 0x5C47307: clReleaseMemObjectIntelFPGA (in /opt/intelFPGA/18.1/hld/host/linux64/lib/libalteracl.so)
==55813==    by 0x4E3CFFE: clReleaseMemObject (in /opt/intelFPGA/18.1/hld/host/linux64/lib/libOpenCL.so.1)
==55813==    by 0x4011BE: main (main.c:96)
==55813==  Block was alloc'd at
==55813==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==55813==    by 0x5C41A87: acl_mem_aligned_malloc (in /opt/intelFPGA/18.1/hld/host/linux64/lib/libalteracl.so)
==55813==    by 0x5C49A12: clCreateBufferIntelFPGA (in /opt/intelFPGA/18.1/hld/host/linux64/lib/libalteracl.so)
==55813==    by 0x4E3CF17: clCreateBuffer (in /opt/intelFPGA/18.1/hld/host/linux64/lib/libOpenCL.so.1)
==55813==    by 0x40113E: main (main.c:83)
==55813==     
==55813== Invalid write of size 1
==55813==    at 0xBC304DB: sample (sample.cl:5)
==55813==  Address 0xb8a5002 is 914 bytes inside a block of size 1,049,600 free'd
==55813==    at 0x4C2EDEB: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==55813==    by 0x5C42D19: acl_mem_aligned_free (in /opt/intelFPGA/18.1/hld/host/linux64/lib/libalteracl.so)
==55813==    by 0x5C47307: clReleaseMemObjectIntelFPGA (in /opt/intelFPGA/18.1/hld/host/linux64/lib/libalteracl.so)
==55813==    by 0x4E3CFFE: clReleaseMemObject (in /opt/intelFPGA/18.1/hld/host/linux64/lib/libOpenCL.so.1)
==55813==    by 0x4011BE: main (main.c:96)
==55813==  Block was alloc'd at
==55813==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==55813==    by 0x5C41A87: acl_mem_aligned_malloc (in /opt/intelFPGA/18.1/hld/host/linux64/lib/libalteracl.so)
==55813==    by 0x5C49A12: clCreateBufferIntelFPGA (in /opt/intelFPGA/18.1/hld/host/linux64/lib/libalteracl.so)
==55813==    by 0x4E3CF17: clCreateBuffer (in /opt/intelFPGA/18.1/hld/host/linux64/lib/libOpenCL.so.1)
==55813==    by 0x40113E: main (main.c:83)&lt;/CODE&gt;&lt;P&gt;Is this a bug, or am I missing something?&lt;/P&gt;</description>
      <pubDate>Mon, 21 Jan 2019 15:52:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/clReleaseMemObject-just-after-clEnqueueTask-causes-segfault/m-p/209751#M1</guid>
      <dc:creator>kazum</dc:creator>
      <dc:date>2019-01-21T15:52:39Z</dc:date>
    </item>
    <item>
      <title>Re: clReleaseMemObject just after clEnqueueTask causes segfault</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/clReleaseMemObject-just-after-clEnqueueTask-causes-segfault/m-p/209752#M2</link>
      <description>&lt;P&gt;Try compiling your kernel again another OpenCL SDK; e.g. AMD's or NVIDIA's SDK and see if your code would segfault in the same place. If it doesn't, then this is a bug in the Intel FPGA SDK. Though, I am very surprised to see that the specification claims the buffer will be freed after commands depending on it finish; the command queue information is NOT passed to the clReleaseMemObject function as an argument, so I fail to see how this function will be able to determine when the buffer is safe to delete.&lt;/P&gt;</description>
      <pubDate>Tue, 22 Jan 2019 12:32:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/clReleaseMemObject-just-after-clEnqueueTask-causes-segfault/m-p/209752#M2</guid>
      <dc:creator>HRZ</dc:creator>
      <dc:date>2019-01-22T12:32:01Z</dc:date>
    </item>
    <item>
      <title>Re: clReleaseMemObject just after clEnqueueTask causes segfault</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/clReleaseMemObject-just-after-clEnqueueTask-causes-segfault/m-p/209753#M3</link>
      <description>&lt;P&gt;Thanks for your reply.  I tried NVIDIA's SDK and Xilinx SDAccel, and those frameworks worked correctly.  It looks like a bug in the Intel FPGA SDK.&lt;/P&gt;</description>
      <pubDate>Tue, 22 Jan 2019 13:31:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/clReleaseMemObject-just-after-clEnqueueTask-causes-segfault/m-p/209753#M3</guid>
      <dc:creator>kazum</dc:creator>
      <dc:date>2019-01-22T13:31:33Z</dc:date>
    </item>
  </channel>
</rss>

