clEnqueueWriteBuffer does not finish before Kernel

Kaj_W_ · ‎11-20-2015

Hi,

in my program I am doing several

clEnqueueWriteBuffer(queue, pDeviceMem, CL_FALSE, 0, mySize, pMyObject, 0, nullptr, nullptr);

before a kernel launch, and expect these operations to finish before the kernel starts.

I am running an In-order queue. However about 50% of the kernel launches don't get the values that should have been entered by clEnqueueWriteBuffer.
If I set the "blocking flag" to CL_TRUE, the behaviour is as expected. Also on NVidia HW the behaviour is OK when running non-blocking buffer writes.

My system is running Windows 7, and Intel HD4600 with the latest driver..

Have you got any hints? Do I need to use a certain type of memory (Pinned, mappep/unmapped etc.) or should non-blocking operations work on CL memory created without the USE_HOST_MEMORY?

Robert_I_Intel · ‎11-25-2015

Hi Kaj,

You can get an event from each of the non-blocking clEnqueueWriteBuffer call and then wait for those events to complete prior to launching the kernel. Another option would be to call clFinish. See this discussion https://community.amd.com/thread/159601

Otherwise, you are asking for trouble, since non-blocking calls are not guaranteed to finish when you launch your kernel: you just lucked out with NVidia runtime.

Kaj_W_ · ‎11-25-2015

Hi Robert,

I thought the whole point of not setting "CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE" was to make sure enqueued calls were running in order.
I will try waiting for events and see if it solves my problem. Calling clFinish() is not an option as I'm trying to make CPU and GPU execution overlap as much as possible. Instead I am polling on events before sending my next batch of work to the GPU.