Intel® oneAPI Data Parallel C++
Support for Intel® oneAPI DPC++ Compiler, Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and GDB*
400 Discussions

How do I translate this OpenACC nested parallelism to SYCL?

gamersensual14
New Contributor I
334 Views

Hi Intel team, here I am with another question. I'm trying to parallelize this code, originally in OpenACC:

  //#pragma acc parallel loop
for(i=0; i<bands; i++)
{
//#pragma acc loop seq
for(j=0; j<lines_samples; j++)
r_m[i] += image_vector[i*lines_samples+j];

r_m[i] /= lines_samples;

//#pragma acc loop
for(j=0; j<lines_samples; j++)
R_o[i*lines_samples+j] = image_vector[i*lines_samples+j] - r_m[i];
}

The thing is that, as you can see, we have a parallelized loop, then inside a sequential loop (because of the reduction) and then another parallelizable loop.

 

I have tried to translate it like this:

my_queue.submit([&](auto &h) {
h.parallel_for(sycl::range(bands), [=](auto i) {
h.single_task<class computo_example>([=]() {
int64_t j;

for(j=0; j<lines_samples; j++)
r_m[i] += image_vector[i*lines_samples+j];
});

r_m[i] /= lines_samples;

h.parallel_for(sycl::range(lines_samples), [=](auto j) {
R_o[i*lines_samples+j] = image_vector[i*lines_samples+j] - r_m[i];
});
});
}).wait();

But I'm not sure if that is correct, as I don't have clear if nested structures like this are possible in SYCL. Can I execute a parallel_for inside another parallel_for? Can I execute single_task inside a parallel for? And the other way around?

 

With that SYCL code, I'm getting a huge error output, I'll attach the most important lines:

VCA.cpp:866:9: error: use 'template' keyword to treat 'single_task' as a dependent template name
h.single_task<class computo_example>([=]() {
^
template

VCA.cpp:865:41: error: call to deleted constructor of 'sycl::handler'
h.parallel_for(sycl::range(bands), [=](auto i) {
^

VCA.cpp:865:40: error: attempt to use a deleted function
h.parallel_for(sycl::range(bands), [=](auto i) {
^

VCA.cpp:866:7: note: destructor of '' is implicitly deleted because field '' has an inaccessible destructor
h.single_task<class computo_example>([=]() {
^

 

I'm looking for the most generic way to translate this.

 

Looking forward to your answer.

 

Thank you!

0 Kudos
2 Replies
ShwethaS_Intel
Moderator
290 Views

Hi @gamersensual14 ,

 

Thanks for reaching out to us, we are working on your issue.

Please refer the below DPCPP/SYCL documentation for migration.

https://www.intel.com/content/www/us/en/developer/tools/oneapi/data-parallel-c-plus-plus.html

 

Have a Good day!

 

Thanks & Regards,

Shwetha

 

Jie_L_Intel
Employee
243 Views

i suggest you to use oneAPI OpenMP offloading feature to migrate the OpenACC code to run on Intel GPU, not directly to SYCL manualy. Here are links for OpenMP offloading in oneAPI :

https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-cpp-fortran-compiler-...

https://www.youtube.com/watch?v=sn_pHVR0NMk


Reply