Hi all, I have question regarding the read/write of image2d_t pixels and hope someone can post a solution.
I am using MediaSDK to decompress images. After decompression the picture (NV12) resides in an IDirect3DSurface9.
cl_mem memY = clCreateFromDX9MediaSurfaceKHR(context, CL_MEM_READ_ONLY, CL_ADAPTER_D3D9EX_KHR, &surfaceIn, 0, &err);
clEnqueueAcquireDX9MediaSurfacesKHR(queue, 1, memY, 0, 0, 0));
I got a cl_mem handle(which is image2d_t type) and can be passed to my kernel
clSetKernelArg(m_kernel, 1, sizeof(cl_mem), (void*)&memY); // srcImg
Now it's possible to use it in my kernel
__kernel void Dummy(__read_only image2d_t srcY)
for (int i=0; i<16; i++)
float4 val= read_imagef(srcY, CLK_FILTER_NEAREST, sCoord);
pix= convert_uint(val.x*255); // val.x is y value
This works pretty fine, but the performance of read_imagef (single pixel access) is very low.
As explained in the Sobel tutorial (https://software.intel.com/en-us/videos/optimizing-simple-opencl-kernels-sobel-kernel-optimization), I would like to access the pixels in the form of uchar* like:
__global uchar* pSrcImage;
uint16 pix = convert_uint16(vload16(0, pSrcImage));
to read 16 pixel (256 bit) in a single memory access from the Y plane of the NV12 surface. This is possible, when I create a cl_mem with clCreateBuffer(), but I did not find a way to get access to the image2d_t data. The only way (I found) to read the pixels from image2d_t is with read_imagef() which is very slow.
My questions are:
How can I read the pixels of an image2d_t with vload() ?
Can I convert the DirectX surface to a cl_mem which is a "flat" buffer, and not a image2d_t?
Thanks for any help
Please take a look at cl_intel_subgroups extension https://www.khronos.org/registry/cl/extensions/intel/cl_intel_subgroups.txt - specifically, intel_sub_group_block_read functions.
uint intel_sub_group_block_read( Reads 1, 2, 4, or 8 uints of data for each image2d_t image, work item in the subgroup from the specified int2 byte_coord ) image at the specified coordinate as a block uint2 intel_sub_group_block_read2( operation. Note that the coordinate is a image2d_t image, byte coordinate, not an image element int2 byte_coord ) coordinate. Also note that the image data uint4 intel_sub_group_block_read4( is read without format conversion, so each image2d_t image, work item may read multiple image elements int2 byte_coord ) (for images with element size smaller than uint8 intel_sub_group_block_read8( 32-bits). image2d_t image, int2 byte_coord ) The data is read row-by-row, so the first value read is from the row specified in the y-component of the provided byte_coord, the second value is read from the y-component of the provided byte_coord plus one, etc.
So, if you don't mind Intel specific extensions and your targeted hardware supports it, you may try that one. You may also try to change the axis of your read_imagef accesses to see if that improves your performance.