Intel® oneAPI DPC++/C++ Compiler
Talk to fellow users of Intel® oneAPI DPC++/C++ Compiler and companion tools like Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and Intel® Distribution for GDB*

Reading subimage from sycl::image




I'm wondering if there is a way to access a sub-matrix / sub-image while preserving the dimensionalities and not having to load in the entire original matrix data?  For example:


Original input is 128x128, and I want to read a 16x16 tile in 2D format, and then write it back out to an output image also in 2D format.



#define IMG_WIDTH 128
#define IMG_HEIGHT 128

#define WINDOW_LB (IMG_WIDTH/2 - 16/2)
#define WINDOW_UB (IMG_WIDTH/2 + 16/2) 

  float4 src[IMG_HEIGHT][IMG_WIDTH];
  float4 dest[IMG_HEIGHT][IMG_WIDTH];

  for (int i = 0; i < IMG_HEIGHT; i++) {
    for (int j = 0; j < IMG_WIDTH; j++) {
      if (i >= WINDOW_LB && i < WINDOW_UB && j >= WINDOW_LB && j < WINDOW_UB) {
        src[i][j] = float4(1.0f, 2.0f, 3.0f, 4.0f);
        src[i][j] = float4(0.0f, 0.0f, 0.0f, 0.0f);

      dest[i][j] = float4(0.0f, 0.0f, 0.0f, 0.0f);      

  try {

    image<2> srcImage(&(src[WINDOW_LB][WINDOW_LB]), 
      image_channel_order::rgba, image_channel_type::fp32, range<2>(128, 128));

    image<2> destImage(&(dest[WINDOW_LB][WINDOW_LB]), 
      image_channel_order::rgba, image_channel_type::fp32, range<2>(128, 128));

    queue myQueue;

    myQueue.submit([&](handler& cgh) {
      accessor<float4, 2, access::mode::read, access::target::image> inPtr(
          srcImage, cgh);
      accessor<float4, 2, access::mode::write, access::target::image> outPtr(
          destImage, cgh);

      sampler smpl(coordinate_normalization_mode::unnormalized,
                   addressing_mode::clamp, filtering_mode::nearest);

      cgh.parallel_for<mod_image>(range<2>(16, 16), [=](item<2> item) {
        auto coords = int2(item[1], item[0]);
        float4 pixel =, smpl);
        pixel *= 10.0f;
        outPtr.write(coords, pixel);

  } catch (exception e) {
    std::cout << "SYCL exception caught: " << e.what();
    return 2;

  for (int r = 0; r < IMG_HEIGHT; ++r) {
    for (int c = 0; c < IMG_WIDTH; ++c) {
      cl::sycl::float4 expected;
      cl::sycl::float4 computed = dest[r][c];

      if (r >= WINDOW_LB && r < WINDOW_UB && c >= WINDOW_LB && c < WINDOW_UB) {
        expected = {10.f, 20.f, 30.f, 40.f};
      else {
        expected = {0.0f, 0.0f, 0.0f, 0.0f};

      if (!cl::sycl::all(cl::sycl::isequal(dest[r][c], expected))) {
        std::cout << "The output image is incorrect." << std::endl;
        return 1;        


My objective is to read a 16x16 tile from the image starting at coordinates (56,56) and then do some processing on it via the parallel_for. Then write it back out to some coordinate.  But with the code above, I'll be loading in the entire 4bytes per float x 4 floats per element x 128 elements x 128 elements worth of data even though I'm only processing 16x16 elements. Is there a way to only have the image read in the necessary data? Obviously this isn't a problem when ran on the host, but it becomes expensive when offloading to a device.


I've tried looking into the image pitch option but I can't seem to get the behavior I want from it. If there's other sycl data structures / constructs, I am also open to suggestions.


It seems like OpenCL's readImage API has this nice functionality and I was wondering if there's anything similar here?


Thanks all,


0 Kudos
2 Replies



Thanks for raising this concern. I will discuss internally whether reading a sub-image is possible and let you know.





0 Kudos

i do not understand the question. From your test code, you are trying to process a image of (128x128) with float4. But your input range is (16x16). So the co-ordinates of the pixels are within the limit of (0, 0) - (15, 15) and could not starting at (56, 56).

And for your case, the range of the source array that contains your target "tile" starting at (56, 56) with range (16, 16) is not a continuous range in your (128, 128) source array. So if you want to minimize the amount of data copy (not copy the whole 128x128 image), you have to process the tile "row by row" so that each row is a continuous range which you can construct a 1d image per row and the image accessor on it.

SYCL2020 does not have the sub-image concept you wanted.

0 Kudos