OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1722 Discussions

clEnqueueWriteBuffer does not finish before Kernel

Kaj_W_
Beginner
837 Views

Hi,

in my program I am doing several 

clEnqueueWriteBuffer(queue, pDeviceMem, CL_FALSE, 0, mySize, pMyObject, 0, nullptr, nullptr);

before a kernel launch, and expect these operations to finish before the kernel starts.

I am running an In-order queue. However about 50% of the kernel launches don't get the values that should have been entered by clEnqueueWriteBuffer. 
If I set the "blocking flag" to CL_TRUE, the behaviour is as expected. Also on NVidia HW the behaviour is OK when running non-blocking buffer writes.

My system is running Windows 7, and Intel HD4600 with the latest driver..

 

Have you got any hints? Do I need to use a certain type of memory (Pinned, mappep/unmapped etc.) or should non-blocking operations work on CL memory created without the USE_HOST_MEMORY?

 

0 Kudos
2 Replies
Robert_I_Intel
Employee
837 Views

Hi Kaj,

You can get an event from each of the non-blocking clEnqueueWriteBuffer call and then wait for those events to complete prior to launching the kernel. Another option would be to call clFinish. See this discussion https://community.amd.com/thread/159601

Otherwise, you are asking for trouble, since non-blocking calls are not guaranteed to finish when you launch your kernel: you just lucked out with NVidia runtime.

0 Kudos
Kaj_W_
Beginner
837 Views

Hi Robert,

I thought the whole point of not setting "CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE" was to make sure enqueued calls were running in order.
I will try waiting for events and see if it solves my problem. Calling clFinish() is not an option as I'm trying to make CPU and GPU execution overlap as much as possible. Instead I am polling on events before sending my next batch of work to the GPU.

 

0 Kudos
Reply