<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How do I translate this OpenACC nested parallelism to SYCL? in Intel® oneAPI DPC++/C++ Compiler</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/How-do-I-translate-this-OpenACC-nested-parallelism-to-SYCL/m-p/1381157#M2294</link>
    <description>&lt;P&gt;Hi Intel team, here I am with another question. I'm trying to parallelize this code, originally in OpenACC:&lt;/P&gt;
&lt;PRE&gt;  //#pragma acc parallel loop&lt;BR /&gt;  for(i=0; i&amp;lt;bands; i++)&lt;BR /&gt;  {&lt;BR /&gt;    //#pragma acc loop seq &lt;BR /&gt;    for(j=0; j&amp;lt;lines_samples; j++)&lt;BR /&gt;      r_m[i] += image_vector[i*lines_samples+j];&lt;BR /&gt;&lt;BR /&gt;    r_m[i] /= lines_samples;&lt;BR /&gt;&lt;BR /&gt;    //#pragma acc loop&lt;BR /&gt;      for(j=0; j&amp;lt;lines_samples; j++)&lt;BR /&gt;        R_o[i*lines_samples+j] = image_vector[i*lines_samples+j] - r_m[i];&lt;BR /&gt;  }&lt;/PRE&gt;
&lt;P&gt;The thing is that, as you can see, we have a parallelized loop, then inside a sequential loop (because of the reduction) and then another parallelizable loop.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have tried to translate it like this:&lt;/P&gt;
&lt;PRE&gt;my_queue.submit([&amp;amp;](auto &amp;amp;h) {&lt;BR /&gt;  h.parallel_for(sycl::range(bands), [=](auto i) {&lt;BR /&gt;    h.single_task&amp;lt;class computo_example&amp;gt;([=]() {&lt;BR /&gt;      int64_t j;&lt;BR /&gt;&lt;BR /&gt;      for(j=0; j&amp;lt;lines_samples; j++)&lt;BR /&gt;      r_m[i] += image_vector[i*lines_samples+j];&lt;BR /&gt;    });&lt;BR /&gt;&lt;BR /&gt;    r_m[i] /= lines_samples;&lt;BR /&gt;&lt;BR /&gt;    h.parallel_for(sycl::range(lines_samples), [=](auto j) {&lt;BR /&gt;      R_o[i*lines_samples+j] = image_vector[i*lines_samples+j] - r_m[i];&lt;BR /&gt;    });&lt;BR /&gt;  });&lt;BR /&gt;}).wait();&lt;/PRE&gt;
&lt;P&gt;But I'm not sure if that is correct, as I don't have clear if nested structures like this are possible in SYCL. Can I execute a parallel_for inside another parallel_for? Can I execute single_task inside a parallel for? And the other way around?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;With that SYCL code, I'm getting a huge error output, I'll attach the most important lines:&lt;/P&gt;
&lt;PRE&gt;VCA.cpp:866:9: error: use 'template' keyword to treat 'single_task' as a dependent template name&lt;BR /&gt;h.single_task&amp;lt;class computo_example&amp;gt;([=]() {&lt;BR /&gt;^&lt;BR /&gt;template&lt;BR /&gt;&lt;BR /&gt;VCA.cpp:865:41: error: call to deleted constructor of 'sycl::handler'&lt;BR /&gt;h.parallel_for(sycl::range(bands), [=](auto i) {&lt;BR /&gt;                                    ^&lt;BR /&gt;&lt;BR /&gt;VCA.cpp:865:40: error: attempt to use a deleted function&lt;BR /&gt;h.parallel_for(sycl::range(bands), [=](auto i) {&lt;BR /&gt;^&lt;BR /&gt;&lt;BR /&gt;VCA.cpp:866:7: note: destructor of '' is implicitly deleted because field '' has an inaccessible destructor&lt;BR /&gt;h.single_task&amp;lt;class computo_example&amp;gt;([=]() {&lt;BR /&gt;^&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm looking for the most generic way to translate this.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Looking forward to your answer.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you!&lt;/P&gt;</description>
    <pubDate>Mon, 02 May 2022 09:35:44 GMT</pubDate>
    <dc:creator>gamersensual14</dc:creator>
    <dc:date>2022-05-02T09:35:44Z</dc:date>
    <item>
      <title>How do I translate this OpenACC nested parallelism to SYCL?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/How-do-I-translate-this-OpenACC-nested-parallelism-to-SYCL/m-p/1381157#M2294</link>
      <description>&lt;P&gt;Hi Intel team, here I am with another question. I'm trying to parallelize this code, originally in OpenACC:&lt;/P&gt;
&lt;PRE&gt;  //#pragma acc parallel loop&lt;BR /&gt;  for(i=0; i&amp;lt;bands; i++)&lt;BR /&gt;  {&lt;BR /&gt;    //#pragma acc loop seq &lt;BR /&gt;    for(j=0; j&amp;lt;lines_samples; j++)&lt;BR /&gt;      r_m[i] += image_vector[i*lines_samples+j];&lt;BR /&gt;&lt;BR /&gt;    r_m[i] /= lines_samples;&lt;BR /&gt;&lt;BR /&gt;    //#pragma acc loop&lt;BR /&gt;      for(j=0; j&amp;lt;lines_samples; j++)&lt;BR /&gt;        R_o[i*lines_samples+j] = image_vector[i*lines_samples+j] - r_m[i];&lt;BR /&gt;  }&lt;/PRE&gt;
&lt;P&gt;The thing is that, as you can see, we have a parallelized loop, then inside a sequential loop (because of the reduction) and then another parallelizable loop.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have tried to translate it like this:&lt;/P&gt;
&lt;PRE&gt;my_queue.submit([&amp;amp;](auto &amp;amp;h) {&lt;BR /&gt;  h.parallel_for(sycl::range(bands), [=](auto i) {&lt;BR /&gt;    h.single_task&amp;lt;class computo_example&amp;gt;([=]() {&lt;BR /&gt;      int64_t j;&lt;BR /&gt;&lt;BR /&gt;      for(j=0; j&amp;lt;lines_samples; j++)&lt;BR /&gt;      r_m[i] += image_vector[i*lines_samples+j];&lt;BR /&gt;    });&lt;BR /&gt;&lt;BR /&gt;    r_m[i] /= lines_samples;&lt;BR /&gt;&lt;BR /&gt;    h.parallel_for(sycl::range(lines_samples), [=](auto j) {&lt;BR /&gt;      R_o[i*lines_samples+j] = image_vector[i*lines_samples+j] - r_m[i];&lt;BR /&gt;    });&lt;BR /&gt;  });&lt;BR /&gt;}).wait();&lt;/PRE&gt;
&lt;P&gt;But I'm not sure if that is correct, as I don't have clear if nested structures like this are possible in SYCL. Can I execute a parallel_for inside another parallel_for? Can I execute single_task inside a parallel for? And the other way around?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;With that SYCL code, I'm getting a huge error output, I'll attach the most important lines:&lt;/P&gt;
&lt;PRE&gt;VCA.cpp:866:9: error: use 'template' keyword to treat 'single_task' as a dependent template name&lt;BR /&gt;h.single_task&amp;lt;class computo_example&amp;gt;([=]() {&lt;BR /&gt;^&lt;BR /&gt;template&lt;BR /&gt;&lt;BR /&gt;VCA.cpp:865:41: error: call to deleted constructor of 'sycl::handler'&lt;BR /&gt;h.parallel_for(sycl::range(bands), [=](auto i) {&lt;BR /&gt;                                    ^&lt;BR /&gt;&lt;BR /&gt;VCA.cpp:865:40: error: attempt to use a deleted function&lt;BR /&gt;h.parallel_for(sycl::range(bands), [=](auto i) {&lt;BR /&gt;^&lt;BR /&gt;&lt;BR /&gt;VCA.cpp:866:7: note: destructor of '' is implicitly deleted because field '' has an inaccessible destructor&lt;BR /&gt;h.single_task&amp;lt;class computo_example&amp;gt;([=]() {&lt;BR /&gt;^&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm looking for the most generic way to translate this.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Looking forward to your answer.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Mon, 02 May 2022 09:35:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/How-do-I-translate-this-OpenACC-nested-parallelism-to-SYCL/m-p/1381157#M2294</guid>
      <dc:creator>gamersensual14</dc:creator>
      <dc:date>2022-05-02T09:35:44Z</dc:date>
    </item>
    <item>
      <title>Re: How do I translate this OpenACC nested parallelism to SYCL?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/How-do-I-translate-this-OpenACC-nested-parallelism-to-SYCL/m-p/1382295#M2295</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.intel.com/t5/user/viewprofilepage/user-id/232478"&gt;@gamersensual14&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for reaching out to us, we are working on your issue.&lt;/P&gt;
&lt;P&gt;Please refer the below DPCPP/SYCL documentation for migration.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.intel.com/content/www/us/en/developer/tools/oneapi/data-parallel-c-plus-plus.html" target="_blank" rel="noopener noreferrer"&gt;https://www.intel.com/content/www/us/en/developer/tools/oneapi/data-parallel-c-plus-plus.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Have a Good day!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks &amp;amp; Regards,&lt;/P&gt;
&lt;P&gt;Shwetha&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 06 May 2022 10:04:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/How-do-I-translate-this-OpenACC-nested-parallelism-to-SYCL/m-p/1382295#M2295</guid>
      <dc:creator>ShwethaS_Intel</dc:creator>
      <dc:date>2022-05-06T10:04:02Z</dc:date>
    </item>
    <item>
      <title>Re:How do I translate this OpenACC nested parallelism to SYCL?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/How-do-I-translate-this-OpenACC-nested-parallelism-to-SYCL/m-p/1386524#M2296</link>
      <description>&lt;P&gt;i suggest you to use oneAPI OpenMP offloading feature to migrate the OpenACC code to run on Intel GPU, not directly to SYCL manualy. Here are links for OpenMP offloading in oneAPI :&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-cpp-fortran-compiler-openmp/top.html" target="_blank"&gt;https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-cpp-fortran-compiler-openmp/top.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt; &lt;A href="https://www.youtube.com/watch?v=sn_pHVR0NMk" target="_blank"&gt;https://www.youtube.com/watch?v=sn_pHVR0NMk&lt;/A&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 23 May 2022 10:19:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/How-do-I-translate-this-OpenACC-nested-parallelism-to-SYCL/m-p/1386524#M2296</guid>
      <dc:creator>Jie_L_Intel</dc:creator>
      <dc:date>2022-05-23T10:19:08Z</dc:date>
    </item>
  </channel>
</rss>

