What's the best practise for memcpy using Intel Opencl for CPU?
Hi, I am trying to develop opencl code on the intel's cpu, and I have a question on the memcpy using opencl. Does the Opencl on CPU has a efficient way to copy a sub section of data from a large array into a new buffer? e.g. for a array that saved the image data with sz 1000x1000, I want to cp a 19x19 section of the image into a new array and do some computing on the section. I could not find a efficient way to do that. Just copy the data one by one is extremly inefficient. And because of the alignment problem I can not use vectors to do the copy. Does anyone know the good practise for memcpy in opencl?
Thanks, but I am afraid my algorithm cannot do that. Here I would like explain how my algorithms works. there are multiple kernels, each later kernel will denpends on the previous kernel's output.
data--.> kernel1 --> output1/input for kernel2 --> kernel2 --> output2/input for kernel3 --> kernel3 -->finished
to make the latency minimized( the whole algorithm is part of real time app), I did not call the clwaitforevent until the last kernel was enqueued. And the the memcpy happend in the kernel3. the copy position comes from the output of kernel2. I need copy thousands of small data into a new array so that I can utilize the cache memory. But now I found the memcpy is a problem. the performance is really bad. Can anyone suggest a good way to do the memcpy?