OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1686 Discussions

Xeon PHI and enqueueCopyBuffer()



I'm having synchronization problems with enqueueCopyBuffer() and Xeon PHI. Suppose the following pseudocode:

/* Begin host code */
   cl::Buffer b0, b1; ///initialize with size N
   cl::CommandQueue c0, c1;
   cl::Kernel k;
   char hB; ///initialize hB to 5

   c0.enqueueWriteBuffer(b0, CL_TRUE, 0, N, static_cast<void *> (hB));

   c0.enqueueNDRangeKernel(k, cl::NullRange, 1, 1); //one only thread

   c1.enqueueCopyBuffer(b0, b1, 0, 0, N);

   c1.enqueueReadBuffer(b1, CL_TRUE, 0, N static_cast<void *> (hB));

/*End host code */

/* The kernel code would be: */
__kernel void add1(__global char *p, int n) {
   for(int i = get_global_id(0); i < n; i += 1) 


The cuestion is: After the enqueueReadBuffer call, What values are stored in hB? The answer is 'it depends'. With two Nvidia GPUs the values are 6 and with two Xeon PHI are 5. The problem is that the implementation of the enqueueCopyBuffer in the Xeon Phi is non-blocking and it needs an synchronization barrier between the enqueueNDRangeKernel call and the enqueueCopyBuffer. Is this a normal behavior or it is an implementation error? 

Thanks a lot in advance and good luck :)



0 Kudos
1 Reply

Your bug is not in CopyBuffer, but is absent of synchronization between 2 command queues.

In your case either WriteBuffer or NDRange will start execution on device concurrently with CopyBuffer as command queues c0 and c1 are completely independent.