<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Reductions acting weird in Intel® oneAPI DPC++/C++ Compiler</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Reductions-acting-weird/m-p/1223587#M782</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;I have some issues with reductions that I don't really understand. Let's first take a look at the short sample program I used to try to understand these issues:&lt;/P&gt;
&lt;LI-CODE lang="cpp"&gt;#include "CL/sycl.hpp"
#include &amp;lt;iostream&amp;gt;
#include &amp;lt;array&amp;gt;

using std::array;
using namespace cl::sycl;

constexpr auto dp_read = access::mode::read;
constexpr auto dp_write = access::mode::write;

int main() {
  cpu_selector device_selector;
  // cl::sycl::gpu_selector device_selector;
  queue q(device_selector);

  constexpr unsigned size = 125000;
  constexpr unsigned workGroupSize = 250;
  range workGroupRange{workGroupSize};

  std::cout &amp;lt;&amp;lt; "workGroupSize: " &amp;lt;&amp;lt; workGroupSize &amp;lt;&amp;lt; '\n';

  array&amp;lt;double, size&amp;gt; a;

  for (int i = 0; i &amp;lt; size; i++) {
    a[i] = 2.5 * i;
  } // this for-loop assures the maximum value of the array is at a[size-1]

  buffer a_buf{a};
  double max;
  {
    buffer max_buf{&amp;amp;max, cl::sycl::range{1}};
    q.submit([&amp;amp;](cl::sycl::handler &amp;amp;h) {
      auto a_acc = a_buf.get_access&amp;lt;dp_read&amp;gt;(h);
      auto max_acc = accessor&amp;lt;double, 0, access::mode::discard_write, access::target::global_buffer&amp;gt;(max_buf, h);

      h.parallel_for(cl::sycl::nd_range&amp;lt;1&amp;gt;{cl::sycl::range(size), workGroupRange}, ONEAPI::reduction(max_acc, ONEAPI::maximum&amp;lt;double&amp;gt;()),
        [=](nd_item&amp;lt;1&amp;gt; it, auto &amp;amp;part_max) {
          part_max.combine(a_acc[it.get_global_id()]);
      });
    });
  }

  
  std::cout &amp;lt;&amp;lt; std::boolalpha &amp;lt;&amp;lt; (a[size - 1] == max) &amp;lt;&amp;lt; std::endl;

  return (0);
}
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;The problem is that depending on the size and workGroupSize chosen and whether I select the cpu_selector or the gpu_selector, this code returns true (i.e. the reduction found the maximum value), false (i.e. the reduction returned the wrong value, happens on the GPU) or it throws an OpenCL error (-5, CL_OUT_OF_RESOURCES, happens on the CPU), and I don't understand why only some of the combinations of size and workGroupSize work (I should note that I made sure size is always divisible by workGroupSize).&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So my questions would be:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Why does the number of items per work group influence whether or not I get a CL_OUT_OF_RESOURCES error on the CPU?&lt;/LI&gt;
&lt;LI&gt;Similarly, why does the number of items per work group influence whether or not the reduction returns the correct value on the GPU?&lt;/LI&gt;
&lt;LI&gt;Why does it sometimes return a wrong number at all on the GPU?&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;A table with the values I put in for size and workGroupSize and the result (true/false/error) is attached.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My environment is as follows:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;KDE Neon (essentially Ubuntu 20.04 with KDE applications)&lt;/LI&gt;
&lt;LI&gt;Intel oneAPI Base Toolkit (installed as intel-basekit via the Intel repo for Ubuntu), version&amp;nbsp;&lt;SPAN&gt;2021.1-2261.beta10&lt;BR /&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;Compiled using dpcpp (I usedd the CMake sample project for Linux as a basis)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I run this on my laptop (the CPU is an Intel Core i5-9300H, the GPU the integrated Intel UHD 630)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 28 Oct 2020 13:22:46 GMT</pubDate>
    <dc:creator>dobbsy</dc:creator>
    <dc:date>2020-10-28T13:22:46Z</dc:date>
    <item>
      <title>Reductions acting weird</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Reductions-acting-weird/m-p/1223587#M782</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;I have some issues with reductions that I don't really understand. Let's first take a look at the short sample program I used to try to understand these issues:&lt;/P&gt;
&lt;LI-CODE lang="cpp"&gt;#include "CL/sycl.hpp"
#include &amp;lt;iostream&amp;gt;
#include &amp;lt;array&amp;gt;

using std::array;
using namespace cl::sycl;

constexpr auto dp_read = access::mode::read;
constexpr auto dp_write = access::mode::write;

int main() {
  cpu_selector device_selector;
  // cl::sycl::gpu_selector device_selector;
  queue q(device_selector);

  constexpr unsigned size = 125000;
  constexpr unsigned workGroupSize = 250;
  range workGroupRange{workGroupSize};

  std::cout &amp;lt;&amp;lt; "workGroupSize: " &amp;lt;&amp;lt; workGroupSize &amp;lt;&amp;lt; '\n';

  array&amp;lt;double, size&amp;gt; a;

  for (int i = 0; i &amp;lt; size; i++) {
    a[i] = 2.5 * i;
  } // this for-loop assures the maximum value of the array is at a[size-1]

  buffer a_buf{a};
  double max;
  {
    buffer max_buf{&amp;amp;max, cl::sycl::range{1}};
    q.submit([&amp;amp;](cl::sycl::handler &amp;amp;h) {
      auto a_acc = a_buf.get_access&amp;lt;dp_read&amp;gt;(h);
      auto max_acc = accessor&amp;lt;double, 0, access::mode::discard_write, access::target::global_buffer&amp;gt;(max_buf, h);

      h.parallel_for(cl::sycl::nd_range&amp;lt;1&amp;gt;{cl::sycl::range(size), workGroupRange}, ONEAPI::reduction(max_acc, ONEAPI::maximum&amp;lt;double&amp;gt;()),
        [=](nd_item&amp;lt;1&amp;gt; it, auto &amp;amp;part_max) {
          part_max.combine(a_acc[it.get_global_id()]);
      });
    });
  }

  
  std::cout &amp;lt;&amp;lt; std::boolalpha &amp;lt;&amp;lt; (a[size - 1] == max) &amp;lt;&amp;lt; std::endl;

  return (0);
}
&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;The problem is that depending on the size and workGroupSize chosen and whether I select the cpu_selector or the gpu_selector, this code returns true (i.e. the reduction found the maximum value), false (i.e. the reduction returned the wrong value, happens on the GPU) or it throws an OpenCL error (-5, CL_OUT_OF_RESOURCES, happens on the CPU), and I don't understand why only some of the combinations of size and workGroupSize work (I should note that I made sure size is always divisible by workGroupSize).&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So my questions would be:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Why does the number of items per work group influence whether or not I get a CL_OUT_OF_RESOURCES error on the CPU?&lt;/LI&gt;
&lt;LI&gt;Similarly, why does the number of items per work group influence whether or not the reduction returns the correct value on the GPU?&lt;/LI&gt;
&lt;LI&gt;Why does it sometimes return a wrong number at all on the GPU?&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;A table with the values I put in for size and workGroupSize and the result (true/false/error) is attached.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My environment is as follows:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;KDE Neon (essentially Ubuntu 20.04 with KDE applications)&lt;/LI&gt;
&lt;LI&gt;Intel oneAPI Base Toolkit (installed as intel-basekit via the Intel repo for Ubuntu), version&amp;nbsp;&lt;SPAN&gt;2021.1-2261.beta10&lt;BR /&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;Compiled using dpcpp (I usedd the CMake sample project for Linux as a basis)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I run this on my laptop (the CPU is an Intel Core i5-9300H, the GPU the integrated Intel UHD 630)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Oct 2020 13:22:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Reductions-acting-weird/m-p/1223587#M782</guid>
      <dc:creator>dobbsy</dc:creator>
      <dc:date>2020-10-28T13:22:46Z</dc:date>
    </item>
    <item>
      <title>Re:Reductions acting weird</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Reductions-acting-weird/m-p/1223819#M786</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;The issue is reproducible in my environment with larger input sizes. Please note that I'm investigating on this and will get back to you with the updates.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for reporting this issue.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Rahul&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 29 Oct 2020 13:12:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Reductions-acting-weird/m-p/1223819#M786</guid>
      <dc:creator>RahulV_intel</dc:creator>
      <dc:date>2020-10-29T13:12:27Z</dc:date>
    </item>
    <item>
      <title>Re: Re:Reductions acting weird</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Reductions-acting-weird/m-p/1226481#M802</link>
      <description>&lt;P&gt;Is there any update on the workgroup size issue on a CPU target?&lt;/P&gt;
&lt;P&gt;Having the same issue on Windows, tested on beta09 and beta10.&lt;/P&gt;
&lt;P&gt;Also, enabling optimization allows a larger workgroup size to pass without&amp;nbsp;&lt;SPAN&gt;CL_OUT_OF_RESOURCES.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Note: Reproducible with just&amp;nbsp;&lt;STRONG&gt;parallel_for(nd_range)&lt;/STRONG&gt;, without other intrinsic/patterns.&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Nov 2020 03:13:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Reductions-acting-weird/m-p/1226481#M802</guid>
      <dc:creator>ChoonHo</dc:creator>
      <dc:date>2020-11-09T03:13:02Z</dc:date>
    </item>
    <item>
      <title>Re:Reductions acting weird</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Reductions-acting-weird/m-p/1228102#M811</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Apologies for the delay.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;The issue is reproducible at my end even with beta10. I have escalated this issue to the concerned team for a fix.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for reporting this.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Rahul&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 13 Nov 2020 12:12:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Reductions-acting-weird/m-p/1228102#M811</guid>
      <dc:creator>RahulV_intel</dc:creator>
      <dc:date>2020-11-13T12:12:53Z</dc:date>
    </item>
    <item>
      <title>Re: Reductions acting weird</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Reductions-acting-weird/m-p/1346434#M1778</link>
      <description>&lt;P&gt;I also encountered this bug.&lt;/P&gt;
&lt;P&gt;If the type of the variable 'total' on MonteCarloPi example is changed from int to float or double,&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;CL_OUT_OF_RESOURCES will occur on CPU.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;I checked this on 2021.4 and 2022.1, both has this bug.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Thanks&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Yusuke&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 24 Dec 2021 06:39:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Reductions-acting-weird/m-p/1346434#M1778</guid>
      <dc:creator>yusuke-konno</dc:creator>
      <dc:date>2021-12-24T06:39:43Z</dc:date>
    </item>
  </channel>
</rss>

