Intel® oneAPI DPC++/C++ Compiler
Talk to fellow users of Intel® oneAPI DPC++/C++ Compiler and companion tools like Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and Intel® Distribution for GDB*
724 Discussions

How do I translate this OpenACC nested parallelism to SYCL?

gamersensual14
New Contributor I
1,917 Views

Hi Intel team, here I am with another question. I'm trying to parallelize this code, originally in OpenACC:

  //#pragma acc parallel loop
for(i=0; i<bands; i++)
{
//#pragma acc loop seq
for(j=0; j<lines_samples; j++)
r_m[i] += image_vector[i*lines_samples+j];

r_m[i] /= lines_samples;

//#pragma acc loop
for(j=0; j<lines_samples; j++)
R_o[i*lines_samples+j] = image_vector[i*lines_samples+j] - r_m[i];
}

The thing is that, as you can see, we have a parallelized loop, then inside a sequential loop (because of the reduction) and then another parallelizable loop.

 

I have tried to translate it like this:

my_queue.submit([&](auto &h) {
h.parallel_for(sycl::range(bands), [=](auto i) {
h.single_task<class computo_example>([=]() {
int64_t j;

for(j=0; j<lines_samples; j++)
r_m[i] += image_vector[i*lines_samples+j];
});

r_m[i] /= lines_samples;

h.parallel_for(sycl::range(lines_samples), [=](auto j) {
R_o[i*lines_samples+j] = image_vector[i*lines_samples+j] - r_m[i];
});
});
}).wait();

But I'm not sure if that is correct, as I don't have clear if nested structures like this are possible in SYCL. Can I execute a parallel_for inside another parallel_for? Can I execute single_task inside a parallel for? And the other way around?

 

With that SYCL code, I'm getting a huge error output, I'll attach the most important lines:

VCA.cpp:866:9: error: use 'template' keyword to treat 'single_task' as a dependent template name
h.single_task<class computo_example>([=]() {
^
template

VCA.cpp:865:41: error: call to deleted constructor of 'sycl::handler'
h.parallel_for(sycl::range(bands), [=](auto i) {
^

VCA.cpp:865:40: error: attempt to use a deleted function
h.parallel_for(sycl::range(bands), [=](auto i) {
^

VCA.cpp:866:7: note: destructor of '' is implicitly deleted because field '' has an inaccessible destructor
h.single_task<class computo_example>([=]() {
^

 

I'm looking for the most generic way to translate this.

 

Looking forward to your answer.

 

Thank you!

0 Kudos
2 Replies
ShwethaS_Intel
Moderator
1,873 Views

Hi @gamersensual14 ,

 

Thanks for reaching out to us, we are working on your issue.

Please refer the below DPCPP/SYCL documentation for migration.

https://www.intel.com/content/www/us/en/developer/tools/oneapi/data-parallel-c-plus-plus.html

 

Have a Good day!

 

Thanks & Regards,

Shwetha

 

0 Kudos
Jie_L_Intel
Employee
1,826 Views

i suggest you to use oneAPI OpenMP offloading feature to migrate the OpenACC code to run on Intel GPU, not directly to SYCL manualy. Here are links for OpenMP offloading in oneAPI :

https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-cpp-fortran-compiler-openmp/top.html

https://www.youtube.com/watch?v=sn_pHVR0NMk


0 Kudos
Reply