Error: Unsupported accessor target. -59 (CL_INVALID_OPERATION)

Vitors · ‎10-22-2020

Hello all,

I'm playing with some SYCL code and I get this exception every time I try to access buffers. I've tried to change selectors to check if anything changes, but no. Every time I try to access buffers by indexing, I get that. Commenting buffer access, kernel runs fine (of course, not doing the work I intend to do).

This is my kernel:

  vector<uint8_t> arr(m*n);
  
  // Create buffer from data
  cl::sycl::buffer<uint8_t, 2> buf(data, cl::sycl::range<2>{m, n});
  cl::sycl::buffer<uint8_t, 2> res(arr.data(), cl::sycl::range<2>{m, n});
    
  cl::sycl::gpu_selector selector;
  auto dev = selector.select_device();

  // Device information
  cout<<"Name: "<<dev.get_info<cl::sycl::info::device::name>()<<endl;
  cout<<"Version: "<<dev.get_info<cl::sycl::info::device::version>()<<endl;
  cout<<"Vendor: "<<dev.get_info<cl::sycl::info::device::vendor>()<<endl;
  cout<<"Driver version: "<<dev.get_info<cl::sycl::info::device::driver_version>()<<endl;

  auto edge = cl::sycl::range<2>{m-2, n-2};

  try
    {
      cl::sycl::queue queue(selector);

      auto bu = buf.get_access<cl::sycl::access::mode::read>();
      auto re = res.get_access<cl::sycl::access::mode::write>();
         
      queue.submit([&](cl::sycl::handler &h)
        {
          h.parallel_for<class edge>(cl::sycl::range<2>(m-2, n-2),
                                     cl::sycl::id<2>(1, 1),
                                     [=](cl::sycl::id<2> idx)
            {
              //cout<<"("<<idx[0]<<")("<<idx[1]<<") ";
              //auto sumf = 254.0;

              auto sumf = (bu[{idx[0]-1, idx[1]-1}] +
                           bu[{idx[0], idx[1]-1}] +
                           bu[{idx[0]+1, idx[1]-1}] +
                           bu[{idx[0]-1, idx[1]}] +
                           bu[{idx[0], idx[1]}] +
                           bu[{idx[0]+1, idx[1]}] +
                           bu[{idx[0]-1, idx[1]+1}] +
                           bu[{idx[0], idx[1]+1}] +
                           bu[{idx[0]+1, idx[1]+1}]
                           )/9;

              auto sum = static_cast<uint8_t>(sumf);
              re[idx] = static_cast<uint8_t>(std::min(std::max(0, sum - (9 * bu[idx])), 255));

            });
        });
      queue.wait();

    } // end try

  catch (std::exception& ex)
    {
     std::cerr<<"exception caught: "<< ex.what() << std::endl;
      return 1;
    }

Anything am I clearly missing?

If I can't access buffers the way I'm trying to I would appreciate suggestions.

Thanks.

AthiraM_Intel · ‎10-23-2020

Hi,

Thanks for reaching out to us.

We are moving your query to Intel® oneAPI Data Parallel C++ forum for a faster response.

Thanks

GouthamK_Intel · ‎10-26-2020

Hi Vitor,

We have skimmed through the code snippet shared by you.

Could you please try changing the syntax of the way you are trying to access the accessor?

From

bu[{idx[0]-1, idx[1]-1}]

to

bu[idx[0]-1][idx[1]-1]

and similarly at other places too wherever you are using the accessor.

If you still face the same error, please share the complete source code, error logs and steps to reproduce if possible.

Have a Good day!

Thanks & Regards

Goutham

Vitors · ‎10-26-2020

Dear Goutham,

Thanks for your reply.

Well, I could progress on the error (-59 CL_INVALID_OPERATION): the problem was that my buffers AND my accessors were out of scope. A simply moved them into the queue.submit call and added the handler accessors definition.

I also had to capture by value on the lambda inside my parallel for.

These steps made my former code work.

However, I'm glad you replied because after that I got stuck in a different way. I slightly changed my code in order to have my kernel is a library and defined as a class. No compilation issues. However, when I try to run my code I get the following error (the tar.gz file has the new code, logs and the image I use to test it. It is also changed in the way you suggested, just in case. Code compiles fine, but the error is still there also with the new syntax.)

pure virtual method called
terminate called without an active exception
/var/spool/torque/mom_priv/jobs/721101.v-qsvr-1.aidevcloud.SC: line 5:  1631 Aborted                 ./main form.png

I rapidly debugged it and realized that the error arises when I call queue.submit.

My naive guess is that there is something wrong with defining my class at the parallel_for. But it is just a blind guess, since my class definition is absolutely basic and can't see whats wrong.

Thank you very much for your help.

Regards,

Vitor

GouthamK_Intel · ‎10-27-2020

Hi Vitor,

Thanks for providing the source code and the logs, we looked into it.

Could you please bring buffers out of the queue.submit scope and try running the code?

Also, I'm provided with the edited version of the code which we tested and executing without any errors, just in case.

// bufferlibclass.cpp

#define STB_IMAGE_IMPLEMENTATION
#define STB_IMAGE_WRITE_IMPLEMENTATION

#include"bufferlibclass.h"
//#include"../stb/stb_image.h"
//#include"../stb/stb_image_write.h"

using namespace std;

size_t cc::height()
{
  return this->h;
}

size_t cc::width()
{
  return this->w;
}

int cc::image()
//  int cc::image(size_t m, size_t n, uint8_t* data, uint8_t* target)
{
  float pi = 0.0;
  //  vector<uint8_t> arr(m*n);

  cl::sycl::default_selector selector;
  //cl::sycl::device dev = selector.select_device();
  auto dev = selector.select_device();

  // Device information
  cout<<"Name: "<<dev.get_info<cl::sycl::info::device::name>()<<endl;
  cout<<"Version: "<<dev.get_info<cl::sycl::info::device::version>()<<endl;
  cout<<"Vendor: "<<dev.get_info<cl::sycl::info::device::vendor>()<<endl;
  cout<<"Driver version: "<<dev.get_info<cl::sycl::info::device::driver_version>()<<endl;
  //  cout<<"Image support: "<<dev.get_info<cl::sycl::info::device::image_support>()<<endl;
  //  cout<<"Max mem alloc size: "<<dev.get_info<cl::sycl::info::device::max_mem_alloc_size>()<<endl;
  //  cout<<"Vendor ID: "<<dev.get_info<cl::sycl::info::device::vendor_id>()<<endl;

  try
    {

      cl::sycl::queue queue(selector);

 cl::sycl::buffer<uint8_t, 2> buf(data, cl::sycl::range<2>{w, h}); //EDITED
 cl::sycl::buffer<uint8_t, 2> res(result, cl::sycl::range<2>{w, h}); //EDITED

      queue.submit([&](cl::sycl::handler &hand)
      {
        // Try to create buffer from data
//      cl::sycl::buffer<uint8_t, 2> buf(data, cl::sycl::range<2>{w, h});
//      cl::sycl::buffer<uint8_t, 2> res(result, cl::sycl::range<2>{w, h});

        auto edge = cl::sycl::range<2>{w-2, h-2};

        // Os accessors têm que estar dentro do submit (e referenciando o handler)
        auto bu = buf.get_access<cl::sycl::access::mode::read>(hand);
        auto re = res.get_access<cl::sycl::access::mode::write>(hand);

        // Usa essa implementação do parallel_for. Nela tem o parametro (segundo)
        // que permite colocar um offSet. Exatamente o que precisa pra não usar as bordas
        //
        //        template <typename KernelName,
        //                  typename KernelType, int dimensions>
        //          void parallel_for(range<dimensions> numWorkItems,
        //                            id<dimensions> workItemOffset, KernelType kernelFunc);
        hand.parallel_for<class edge>
          (cl::sycl::range<2>(w-2, h-2),
           cl::sycl::id<2>(1, 1),
           [=](cl::sycl::id<2> idx)
           {
             // idx[0] varia m
             // idx[1] varia n
             //       if(bu[idx]>0) re[idx] = 1; //bu[idx];
             // re[idx] = bu[idx];
             //cout<<"("<<idx[0]<<")("<<idx[1]<<") ";

             // Seguindo o Cel%{erity fazer um laplaciano
             float sumf = (bu[idx[0]-1][idx[1]-1] +
                           bu[idx[0]][idx[1]-1] +
                           bu[idx[0]+1][idx[1]-1] +
                           bu[idx[0]-1][idx[1]] +
                           bu[idx[0]][idx[1]] +
                           bu[idx[0]+1][idx[1]] +
                           bu[idx[0]-1][idx[1]+1] +
                           bu[idx[0]][idx[1]+1] +
                           bu[idx[0]+1][idx[1]+1]
                           )/9;

             int sum = static_cast<uint8_t>(sumf);
             re[idx] = static_cast<uint8_t>(std::min(std::max(0, sum - (9 * bu[idx])), 255));

           });
      });
      queue.wait();

    } // end try
  catch (std::exception& ex)
    {
      std::cerr<<"exception caught: "<< ex.what() << std::endl;
      return 1;
    }


  return 0;
}

Please let us know if this resolves your issue.

Have a Good day!

Thanks & Regards

Goutham

Vitors · ‎10-27-2020

Hi Goutham,

It works like charm, thanks.

Still need to learn more about accessors, buffers and theirs scopes, but today I moved one step further.

All the best,

Vitor

GouthamK_Intel · ‎10-28-2020

Hi Vitor,

Glad to know that your issue is resolved!

To get started you can look at the below playlist of oneAPI.

https://techdecoded.intel.io/essentials/dpc-part-1-an-introduction-to-the-new-programming-model/#gs.juy6ix

As this issue has been resolved, we will no longer respond to this thread.

If you require any additional assistance from Intel, please start a new thread.

Any further interaction in this thread will be considered community only.

Thanks & Regards

Goutham