which one to use clEnqueueWriteBuffer or clEnqueueMapBuffer ?

nikey1 · ‎07-06-2012

Hello all,
Can you pls suggest me how do I proceed after this.
[cpp]cl_image_format image; // set the image data type being used and the order image.image_channel_data_type = CL_FLOAT; image.image_channel_order = CL_RGBA; cl_mem srcimg, dstimg; // Create the 2D image and the destination buffer. srcimg = clCreateImage2D(context,CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, ℑ, 4, 4, sizeof(cl_float4)*4, input_data, &error); dstimg = clCreateBuffer(context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, sizeof(cl_float4)*4*4, output_data, &error);[/cpp]
Just for convenience I have taken a float array for input image of size 4x4. Assume that "input_data" is not NULL. output_data is of type float*. I haven't allocated any memory to this. I guess I should allocate memory for this using malloc. Correct me if I am wrong.
Should I useclEnqueueWriteBuffer or clEnqueueMapBuffer after these above statements. Pls explain..

Raghupathi_M_Intel · ‎07-06-2012

Yes, you need to allocate memory for output_data.

With clEnqueueWriteBuffer you are enqueing a command to write to a cl_mem buffer object from host memory. You would use clEnqueueMapBuffer() to map a region of a cl_mem object into the host memory. Once you are done executingthe kernel you can use clEnqueueUnMapBuffer() call to unmap the mapped region.

So at this point you need to set your kernel arguments, enqueue the kernel, map the output buffer, execute the kernel, and unmap the output buffer.

Thanks,
Raghu

nikey1 · ‎07-06-2012

Thanks for such a succinct reply. I will try it out.

I have one more query. Say now I used the CL_MEM_USE_HOST_PTR in creating the 2D image, so this will copy nothing to the device, instead the GPU will take themapped memory fromclEnqueueMapBuffer, do the processing and we can writethe resultsto some other location.

On the other hand if I use the CL_MEM_COPY_HOST_PTR, it will create a copy of the data pointed to by host ptr on the device(I guess it will create a separatecopy not just caching). Now the processing will be done on the data that was copied to the device and then again the results are copied to host. I hope I am correct so far.

How about this.. Its just out of mycuriosity that I want to do it this way.I will use the CL_MEM_USE_HOST_PTR and now even though the device can access the host memory, I want the GPU kernel to create a separate copy onto the device(not using the COPY_HOST_PTR because this is again done in the host itself).How can this be done??