in my program I am doing several
clEnqueueWriteBuffer(queue, pDeviceMem, CL_FALSE, 0, mySize, pMyObject, 0, nullptr, nullptr);
before a kernel launch, and expect these operations to finish before the kernel starts.
I am running an In-order queue. However about 50% of the kernel launches don't get the values that should have been entered by clEnqueueWriteBuffer.
If I set the "blocking flag" to CL_TRUE, the behaviour is as expected. Also on NVidia HW the behaviour is OK when running non-blocking buffer writes.
My system is running Windows 7, and Intel HD4600 with the latest driver..
Have you got any hints? Do I need to use a certain type of memory (Pinned, mappep/unmapped etc.) or should non-blocking operations work on CL memory created without the USE_HOST_MEMORY?
You can get an event from each of the non-blocking clEnqueueWriteBuffer call and then wait for those events to complete prior to launching the kernel. Another option would be to call clFinish. See this discussion https://community.amd.com/thread/159601
Otherwise, you are asking for trouble, since non-blocking calls are not guaranteed to finish when you launch your kernel: you just lucked out with NVidia runtime.
I thought the whole point of not setting "CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE" was to make sure enqueued calls were running in order.
I will try waiting for events and see if it solves my problem. Calling clFinish() is not an option as I'm trying to make CPU and GPU execution overlap as much as possible. Instead I am polling on events before sending my next batch of work to the GPU.