Xeon PHI and enqueueCopyBuffer()

moises_v_ · ‎05-30-2014

Hi!

I'm having synchronization problems with enqueueCopyBuffer() and Xeon PHI. Suppose the following pseudocode:

/* Begin host code */
   cl::Buffer b0, b1; ///initialize with size N
   cl::CommandQueue c0, c1;
   cl::Kernel k;
   char hB; ///initialize hB to 5

   c0.enqueueWriteBuffer(b0, CL_TRUE, 0, N, static_cast<void *> (hB));

   c0.enqueueNDRangeKernel(k, cl::NullRange, 1, 1); //one only thread

   c1.enqueueCopyBuffer(b0, b1, 0, 0, N);

   c1.enqueueReadBuffer(b1, CL_TRUE, 0, N static_cast<void *> (hB));

/*End host code */

/* The kernel code would be: */
__kernel void add1(__global char *p, int n) {
   for(int i = get_global_id(0); i < n; i += 1) 
       p++; 
}

The cuestion is: After the enqueueReadBuffer call, What values are stored in hB? The answer is 'it depends'. With two Nvidia GPUs the values are 6 and with two Xeon PHI are 5. The problem is that the implementation of the enqueueCopyBuffer in the Xeon Phi is non-blocking and it needs an synchronization barrier between the enqueueNDRangeKernel call and the enqueueCopyBuffer. Is this a normal behavior or it is an implementation error?

Thanks a lot in advance and good luck :)

Moisés

Dmitry_K_Intel · ‎05-31-2014

Your bug is not in CopyBuffer, but is absent of synchronization between 2 command queues.

In your case either WriteBuffer or NDRange will start execution on device concurrently with CopyBuffer as command queues c0 and c1 are completely independent.