How do I translate this OpenACC nested parallelism to SYCL?

gamersensual14 · ‎05-02-2022

Hi Intel team, here I am with another question. I'm trying to parallelize this code, originally in OpenACC:

  //#pragma acc parallel loop
  for(i=0; i<bands; i++)
  {
    //#pragma acc loop seq 
    for(j=0; j<lines_samples; j++)
      r_m[i] += image_vector[i*lines_samples+j];

    r_m[i] /= lines_samples;

    //#pragma acc loop
      for(j=0; j<lines_samples; j++)
        R_o[i*lines_samples+j] = image_vector[i*lines_samples+j] - r_m[i];
  }

The thing is that, as you can see, we have a parallelized loop, then inside a sequential loop (because of the reduction) and then another parallelizable loop.

I have tried to translate it like this:

my_queue.submit([&](auto &h) {
  h.parallel_for(sycl::range(bands), [=](auto i) {
    h.single_task<class computo_example>([=]() {
      int64_t j;

      for(j=0; j<lines_samples; j++)
      r_m[i] += image_vector[i*lines_samples+j];
    });

    r_m[i] /= lines_samples;

    h.parallel_for(sycl::range(lines_samples), [=](auto j) {
      R_o[i*lines_samples+j] = image_vector[i*lines_samples+j] - r_m[i];
    });
  });
}).wait();

But I'm not sure if that is correct, as I don't have clear if nested structures like this are possible in SYCL. Can I execute a parallel_for inside another parallel_for? Can I execute single_task inside a parallel for? And the other way around?

With that SYCL code, I'm getting a huge error output, I'll attach the most important lines:

VCA.cpp:866:9: error: use 'template' keyword to treat 'single_task' as a dependent template name
h.single_task<class computo_example>([=]() {
^
template

VCA.cpp:865:41: error: call to deleted constructor of 'sycl::handler'
h.parallel_for(sycl::range(bands), [=](auto i) {
                                    ^

VCA.cpp:865:40: error: attempt to use a deleted function
h.parallel_for(sycl::range(bands), [=](auto i) {
^

VCA.cpp:866:7: note: destructor of '' is implicitly deleted because field '' has an inaccessible destructor
h.single_task<class computo_example>([=]() {
^

I'm looking for the most generic way to translate this.

Looking forward to your answer.

Thank you!

ShwethaS_Intel · ‎05-06-2022

Hi @gamersensual14 ,

Thanks for reaching out to us, we are working on your issue.

Please refer the below DPCPP/SYCL documentation for migration.

https://www.intel.com/content/www/us/en/developer/tools/oneapi/data-parallel-c-plus-plus.html

Have a Good day!

Thanks & Regards,

Shwetha

Jie_L_Intel · ‎05-23-2022

i suggest you to use oneAPI OpenMP offloading feature to migrate the OpenACC code to run on Intel GPU, not directly to SYCL manualy. Here are links for OpenMP offloading in oneAPI :

https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-cpp-fortran-compiler-openmp/top.html

https://www.youtube.com/watch?v=sn_pHVR0NMk