Hi!
I'm having synchronization problems with enqueueCopyBuffer() and Xeon PHI. Suppose the following pseudocode:
/* Begin host code */ cl::Buffer b0, b1; ///initialize with size N cl::CommandQueue c0, c1; cl::Kernel k; char hB; ///initialize hB to 5 c0.enqueueWriteBuffer(b0, CL_TRUE, 0, N, static_cast<void *> (hB)); c0.enqueueNDRangeKernel(k, cl::NullRange, 1, 1); //one only thread c1.enqueueCopyBuffer(b0, b1, 0, 0, N); c1.enqueueReadBuffer(b1, CL_TRUE, 0, N static_cast<void *> (hB)); /*End host code */ /* The kernel code would be: */ __kernel void add1(__global char *p, int n) { for(int i = get_global_id(0); i < n; i += 1) p++; }
The cuestion is: After the enqueueReadBuffer call, What values are stored in hB? The answer is 'it depends'. With two Nvidia GPUs the values are 6 and with two Xeon PHI are 5. The problem is that the implementation of the enqueueCopyBuffer in the Xeon Phi is non-blocking and it needs an synchronization barrier between the enqueueNDRangeKernel call and the enqueueCopyBuffer. Is this a normal behavior or it is an implementation error?
Thanks a lot in advance and good luck :)
Moisés
Link Copied
Your bug is not in CopyBuffer, but is absent of synchronization between 2 command queues.
In your case either WriteBuffer or NDRange will start execution on device concurrently with CopyBuffer as command queues c0 and c1 are completely independent.
For more complete information about compiler optimizations, see our Optimization Notice.