Re:DPC++ on Windows, Debug mode can not use local ...

Chen__Peng · ‎05-03-2020

version Oneapi 2011.
the result in the debug mode in inconsistent with release mode.
The release mode result is Correct. the debug mode is incorrect.

the simple source code is as fllows.
------------------------------------------------------------------------------

#include <CL/sycl.hpp>

#include <array>
#include <iostream>

using namespace cl::sycl;

constexpr access::mode sycl_read = access::mode::read;
constexpr access::mode sycl_write = access::mode::write;

#define BOOST_PP_STRINGIZE(_x) #_x
#define PRINT_DEVICE_PROPERTY(dev, prop) \
std::cout << BOOST_PP_STRINGIZE(prop) << ": " \
<< dev.get_info<cl::sycl::info::device::prop>() << std::endl;

#include <array>
#include <cstdint>
#include <iostream>
#include <random>
#include <cassert>

#include <CL/sycl.hpp>

class reduction_kernel;
namespace sycl = cl::sycl;

int main(int, char**) {
typedef float DataType;
std::array<DataType, 16> arr;

std::mt19937 mt_engine(std::random_device{}());
std::uniform_int_distribution<int> idist(0, 10);

   std::cout << "Data: ";
   for (auto& el : arr) {
       //el = idist(mt_engine);
       el = 1;
       std::cout << el << " ";
   }
   std::cout << std::endl;

   //sycl::buffer<int32_t, 1> buf(arr.data(), sycl::range<1>(arr.size()), { sycl::property::buffer::use_host_ptr() });
   sycl::buffer<DataType, 1> buf(arr.data(), sycl::range<1>(arr.size()));
   //sycl::device device = sycl::cpu_selector{}.select_device();
   sycl::device device = sycl::default_selector{}.select_device();

   sycl::queue queue(device, [](sycl::exception_list el) {
       for (auto ex : el) { std::rethrow_exception(ex); }
   });

   // <<Set up queue and check device information>>
   /* Here we manually set the Work Group size to 32,
   but there may be a more optimal size for your device */
   size_t wgroup_size = 32;

auto part_size = wgroup_size * 2;

   auto has_local_mem = device.is_host() ||
       (device.get_info<sycl::info::device::local_mem_type>()
           != sycl::info::local_mem_type::none);
   auto local_mem_size = device.get_info<sycl::info::device::local_mem_size>();
   if (!has_local_mem
       || local_mem_size < (wgroup_size * sizeof(int32_t)))
   {
       throw "Device doesn't have enough local memory!";
   }

   // <<Reduction loop>>
   auto len = arr.size();
   while (len != 1) {
       // division rounding up
       auto n_wgroups = (len + part_size - 1) / part_size;
       queue.submit([&](sycl::handler& cgh) {
           sycl::accessor <int32_t, 1, sycl::access::mode::read_write, sycl::access::target::local>
               local_mem(sycl::range<1>(wgroup_size), cgh);

           auto global_mem = buf.get_access<sycl::access::mode::read_write>(cgh);
           auto global_mem1 = buf.get_access<sycl::access::mode::read_write>(cgh);
           cgh.parallel_for<class reduction_kernel_a>(
               sycl::nd_range<1>(n_wgroups * wgroup_size, wgroup_size),
               [=](sycl::nd_item<1> item) {

               size_t local_id = item.get_local_linear_id();
               size_t global_id = item.get_global_linear_id();
               local_mem[local_id] = 0;

               if ((2 * global_id) < len) {
                   local_mem[local_id] = global_mem[2 * global_id] + global_mem[2 * global_id + 1];
               }
               item.barrier(sycl::access::fence_space::local_space);

               for (size_t stride = 1; stride < wgroup_size; stride *= 2) {
                   auto idx = 2 * stride * local_id;
                   if (idx < wgroup_size) {
                       local_mem[idx] = local_mem[idx] + local_mem[idx + stride];
                   }

item.barrier(sycl::access::fence_space::local_space);
}

               if (local_id == 0) {
                   global_mem[item.get_group_linear_id()] = local_mem[0];
               }
           });
       });
       queue.wait_and_throw();

len = n_wgroups;
}

auto acc = buf.get_access<sycl::access::mode::read>();
std::cout << "Sum: " << acc[0] << std::endl;

return 0;
}
-------------------------------------------------------------------------------------------------------

Using visual studio 2017,
at release mode , the result is : 9
at debug mode, the result is : 7.31924e+08

AbhishekD_Intel · ‎05-03-2020

Hi

We tried the sample that you have provided and tested using both release and debug property of Visual Studio and we found that it the giving same result using both debug and release. The result/output is shown below:

Data: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Sum: 16

We tried it using Visual Studio 2019, with Intel(R) OneAPI DPC++ Compiler 2021.1-beta05 (2020.2.0.0304), and it always good to use update versions as they have fewer chances of having a bug.

So please update the OneAPI version and also the VS versions and if you still have the same issue you can always reach us out. We will try to resolve it.

Warm Regards,

Abhishek

Chen__Peng · ‎05-04-2020

Hi Deshmukh

tks a lot for your reply.
I am using the last OneApi Toolkit. I think it should be the problem of one api.

you are correct, the debug mode and release output the same reuslt.
but I forget to specify the setting for debug mode.
if use such environment options for DPC++ debug, the result in Debug mode will be incorrect.
(I find the problem is that the submition of the kernel occur a error)

SYCL_DEVICE_TYPE=CPU
CL_CONFIG_USE_NATIVE_DEBUGGER=1
SYCL_PROGRAM_COMPILE_OPTIONS=-g -cl-opt-disable

Chen__Peng · ‎05-04-2020

I attach the source code

howver, when using vs2017, the kernel can be debug, while the output is incorrect.

AbhishekD_Intel · ‎05-08-2020

Hi,

Thank you for your finding. We are also getting the same incorrect output from our end.

We are escalating it to our concerned team, they will guide on this context.

Warm Regards,

Abhishek.

PrasanthD_intel · ‎05-25-2021

Hi,

Thanks for your patience. The issue raised by you has been fixed in the latest version 2021.2. Please download and let us know your experience with it.

PrasanthD_intel · ‎05-31-2021

Hi,

We haven't heard back from you. Please let us know your feedback after testing the latest version.

Regards

Prasanth

PrasanthD_intel · ‎06-03-2021

Hi,

We are closing this thread as the issue has been resolved in the latest version.

We will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only

Regards

Prasanth

DPC++ on Windows, Debug mode can not use local memory asitem.barrier(sycl::access::fence_space::local_space)