Have you ever tried the enqueueReadBufferRect() call for the Xeon PHI? It seems that there is a bug in the implementation for this function.
This call copies a region of data from the device to the host. You can set the row pitch and the slide pitch for device side and host side. If you try to copy data from the device to the host, there are some positions in the host pointer that are not updated. The picture at right shows an example of this wrong behaviour implemented in the attached source code. The zeros in red are wrong data, their data should be twos.
It doesn't happen in Nvidia GPUs, neither ATI GPUs or INTEL CPUs. This incorrect behaviour only occurs for the Xeon PHI. Please, Somebody had the same problem?
Thank you so much,