I have discovered what I consider to be a bug in the reference tracking for cl_command_queue instances that leads to a segmentation fault under a reasonable usage scenario.
The cl_command_queue reference count is not incremented when a cl_event instance is created. It should be.
clReleaseCommandQueue can be called while there are still cl_event instances that have not been released. Now, in this case, it is still possible to get a reference to the command queue associated with cl_event by calling clGetEventInfo ( cl_event event, CL_EVENT_COMMAND_QUEUE, , &queue ... ).
I experience a segmentation fault if I use that queue reference to get information about the command queue, for example: clGetCommandQueueInfo( command_queue, CL_QUEUE_DEVICE, ...)
My output shows the queue reference count is unaffected by the event lifecycle. The cl_context reference count also remains fixed. This may or may not be ok. I did not explore it.
$ g++ -o bugDemo command_queue_ref_cnt_demo.cpp bugDemoSupport.cpp -I/opt/intel/opencl/include $ ./bugDemo -p 1 Selected CL_PLATFORM_NAME: Intel(R) OpenCL CL_DEVICE_NAME: Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz CL_DRIVER_VERSION: 126.96.36.199248 reference count after instantiation: context 1, queue 1 reference count after creating an event: context 1, queue 1 reference count after releasing the event: context 1, queue 1
The Apple implementation does not suffer this problem. The queue reference count tracks the number of event instances associated with that queue:
$ ./bugDemo Selected CL_PLATFORM_NAME: Apple CL_DEVICE_NAME: Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz CL_DRIVER_VERSION: 1.1 reference count after instantiation: context 2, queue 1 reference count after creating an event: context 4, queue 2 reference count after releasing the event: context 3, queue 1
I am able to reproduce the behavior you are seeing. But looking at the spec it seems the only time the reference count changes is when calling clRetain*() and clRelease*() API. So, I am not sure what the expected behavior is. I will still open a bug and get back to you once I get more clarification.
Thanks for the reproducer.
I'm glad you got the reproducer to work. You will have noted that it exits normally. It does not enact the usage scenario leading to SEGFAULT that I described.
I believe that the API should internally manage reference counts for what it is responsible for and clRetain*() and clRelease*() are only intended for the user when passing the handle around or destroying the object.
I have never seen an OpenCL example where clRetainCommandQueue() is called upon creating a new cl_event instance. It seems reasonable to expect that the API will do so internally. This is important, as I mentioned, because it is possible to obtain a handle to a command queue via an event handle using clGetEventInfo(CL_EVENT_COMMAND_QUEUE).
I just wanted to double check if the spec said anything about it. But I agree with you that the runtime will have to manage the reference counting. I have filed a bug and let you know of any progress.
You are absolutely right - OpenCL runtime manages internal reference counts. But spec doesn't state explicitly that created events should increment *externally visible counts* also. According to spec externally visible reference counts are influenced only by clRetail* and clRelease* APIs. In your specific case you try to access command queue after deletion of its external interface (external interface is deleted as user visible reference count drops to zero). Spec says nothing specific about about this case.
I think Raghu is right and we should first understand exactly how spec should be read.
I totally agree that I should expect nothing of the externally visible reference count other than what I do with clRetain* and clRelease*.
That said, if I can obtain a handle for an OpenCL object via an OpenCL API function, then that object should still exist.
Another acceptable command queue implementation might just internally track if any OpenCL event instances exist. If any exist, then don't destroy the queue.
This matter came up for me, by the way, as part of implementation for OpenCL event profiling. During runtime, I log minimal information about the OpenCL events my program creates -- just cl_event and a const char * descriptor. During final program cleanup, I write more extensive event information to a file for analysis. My program uses multiple platforms, multiple devices, and multiple queues per device. Sorting this out during program shutdown involves the call clGetEventInfo ( cl_event event, CL_EVENT_COMMAND_QUEUE, , &queue ... ) then clGetCommandQueueInfo( command_queue, CL_QUEUE_DEVICE, ...) which creates the SEGFAULT.