- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi!
I'm having synchronization problems with enqueueCopyBuffer() and Xeon PHI. Suppose the following pseudocode:
/* Begin host code */ cl::Buffer b0, b1; ///initialize with size N cl::CommandQueue c0, c1; cl::Kernel k; char hB; ///initialize hB to 5 c0.enqueueWriteBuffer(b0, CL_TRUE, 0, N, static_cast<void *> (hB)); c0.enqueueNDRangeKernel(k, cl::NullRange, 1, 1); //one only thread c1.enqueueCopyBuffer(b0, b1, 0, 0, N); c1.enqueueReadBuffer(b1, CL_TRUE, 0, N static_cast<void *> (hB)); /*End host code */ /* The kernel code would be: */ __kernel void add1(__global char *p, int n) { for(int i = get_global_id(0); i < n; i += 1) p++; }
The cuestion is: After the enqueueReadBuffer call, What values are stored in hB? The answer is 'it depends'. With two Nvidia GPUs the values are 6 and with two Xeon PHI are 5. The problem is that the implementation of the enqueueCopyBuffer in the Xeon Phi is non-blocking and it needs an synchronization barrier between the enqueueNDRangeKernel call and the enqueueCopyBuffer. Is this a normal behavior or it is an implementation error?
Thanks a lot in advance and good luck :)
Moisés
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your bug is not in CopyBuffer, but is absent of synchronization between 2 command queues.
In your case either WriteBuffer or NDRange will start execution on device concurrently with CopyBuffer as command queues c0 and c1 are completely independent.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page