Intel® oneAPI DPC++/C++ Compiler
Talk to fellow users of Intel® oneAPI DPC++/C++ Compiler and companion tools like Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and Intel® Distribution for GDB*
711 Discussions

DPC++ on Windows, Debug mode can not use local memory asitem.barrier(sycl::access::fence_space::local_space)

Chen__Peng
Beginner
2,042 Views

version Oneapi  2011.
the result in the debug mode in inconsistent with release mode.
The release mode result is Correct. the debug mode is incorrect.

the simple source code is as fllows.
------------------------------------------------------------------------------

#include <CL/sycl.hpp>

#include <array>
#include <iostream>

using namespace cl::sycl;

constexpr access::mode sycl_read = access::mode::read;
constexpr access::mode sycl_write = access::mode::write;

#define BOOST_PP_STRINGIZE(_x) #_x
#define PRINT_DEVICE_PROPERTY(dev, prop) \
  std::cout << BOOST_PP_STRINGIZE(prop) << ": " \
            << dev.get_info<cl::sycl::info::device::prop>() << std::endl;

#include <array>
#include <cstdint>
#include <iostream>
#include <random>
#include <cassert>

#include <CL/sycl.hpp>

class reduction_kernel;
namespace sycl = cl::sycl;

int main(int, char**) {
    typedef float DataType;
    std::array<DataType, 16> arr;

    std::mt19937 mt_engine(std::random_device{}());
    std::uniform_int_distribution<int> idist(0, 10);

    std::cout << "Data: ";
    for (auto& el : arr) {
        //el = idist(mt_engine);
        el = 1;
        std::cout << el << " ";
    }
    std::cout << std::endl;

    //sycl::buffer<int32_t, 1> buf(arr.data(), sycl::range<1>(arr.size()), { sycl::property::buffer::use_host_ptr() });
    sycl::buffer<DataType, 1> buf(arr.data(), sycl::range<1>(arr.size()));
    //sycl::device device = sycl::cpu_selector{}.select_device();
    sycl::device device = sycl::default_selector{}.select_device();


    sycl::queue queue(device, [](sycl::exception_list el) {
        for (auto ex : el) { std::rethrow_exception(ex); }
    });

    // <<Set up queue and check device information>>
    /* Here we manually set the Work Group size to 32,
    but there may be a more optimal size for your device */
    size_t wgroup_size = 32;

    auto part_size = wgroup_size * 2;

    auto has_local_mem = device.is_host() ||
        (device.get_info<sycl::info::device::local_mem_type>()
            != sycl::info::local_mem_type::none);
    auto local_mem_size = device.get_info<sycl::info::device::local_mem_size>();
    if (!has_local_mem
        || local_mem_size < (wgroup_size * sizeof(int32_t)))
    {
        throw "Device doesn't have enough local memory!";
    }

    // <<Reduction loop>>
    auto len = arr.size();
    while (len != 1) {
        // division rounding up
        auto n_wgroups = (len + part_size - 1) / part_size;
        queue.submit([&](sycl::handler& cgh) {
            sycl::accessor <int32_t, 1, sycl::access::mode::read_write, sycl::access::target::local>
                local_mem(sycl::range<1>(wgroup_size), cgh);

            auto global_mem = buf.get_access<sycl::access::mode::read_write>(cgh);
            auto global_mem1 = buf.get_access<sycl::access::mode::read_write>(cgh);
            cgh.parallel_for<class reduction_kernel_a>(
                sycl::nd_range<1>(n_wgroups * wgroup_size, wgroup_size),
                [=](sycl::nd_item<1> item) {

                size_t local_id = item.get_local_linear_id();
                size_t global_id = item.get_global_linear_id();
                local_mem[local_id] = 0;

                if ((2 * global_id) < len) {
                    local_mem[local_id] = global_mem[2 * global_id] + global_mem[2 * global_id + 1];
                }
                item.barrier(sycl::access::fence_space::local_space);

                for (size_t stride = 1; stride < wgroup_size; stride *= 2) {
                    auto idx = 2 * stride * local_id;
                    if (idx < wgroup_size) {
                        local_mem[idx] = local_mem[idx] + local_mem[idx + stride];
                    }

                    item.barrier(sycl::access::fence_space::local_space);
                }

                if (local_id == 0) {
                    global_mem[item.get_group_linear_id()] = local_mem[0];
                }
            });
        });
        queue.wait_and_throw();

        len = n_wgroups;
    }

    auto acc = buf.get_access<sycl::access::mode::read>();
    std::cout << "Sum: " << acc[0] << std::endl;

    return 0;
}
-------------------------------------------------------------------------------------------------------

Using visual studio 2017,
at release mode , the result is : 9
at debug mode, the result is : 7.31924e+08

0 Kudos
7 Replies
AbhishekD_Intel
Moderator
2,042 Views

Hi

We tried the sample that you have provided and tested using both release and debug property of Visual Studio and we found that it the giving same result using both debug and release. The result/output is shown below:                     

Data: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Sum: 16

We tried it using Visual Studio 2019, with Intel(R) OneAPI DPC++ Compiler 2021.1-beta05 (2020.2.0.0304), and it always good to use update versions as they have fewer chances of having a bug.

So please update the OneAPI version and also the VS versions and if you still have the same issue you can always reach us out. We will try to resolve it.

 

Warm Regards,

Abhishek

0 Kudos
Chen__Peng
Beginner
2,042 Views

Hi Deshmukh

tks a lot for your reply.
I am using the last OneApi Toolkit. I think it should be the problem of one api.

you are correct, the debug mode and release output the same reuslt.
but I forget to specify the setting for debug mode.
if use such environment options for DPC++ debug, the result in Debug mode will be incorrect.
(I find the problem is that the submition of the kernel occur a error)

SYCL_DEVICE_TYPE=CPU
CL_CONFIG_USE_NATIVE_DEBUGGER=1
SYCL_PROGRAM_COMPILE_OPTIONS=-g -cl-opt-disable

0 Kudos
Chen__Peng
Beginner
2,042 Views

I attach the source code

howver, when using vs2017, the kernel can be debug, while the output is incorrect.

0 Kudos
AbhishekD_Intel
Moderator
2,042 Views

Hi,

Thank you for your finding. We are also getting the same incorrect output from our end.

We are escalating it to our concerned team, they will guide on this context.

 

Warm Regards,

Abhishek.

0 Kudos
PrasanthD_intel
Moderator
1,833 Views

Hi,


Thanks for your patience. The issue raised by you has been fixed in the latest version 2021.2. Please download and let us know your experience with it.


0 Kudos
PrasanthD_intel
Moderator
1,817 Views

Hi,


We haven't heard back from you. Please let us know your feedback after testing the latest version.


Regards

Prasanth


0 Kudos
PrasanthD_intel
Moderator
1,803 Views

Hi,


We are closing this thread as the issue has been resolved in the latest version.

We will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only


Regards

Prasanth


0 Kudos
Reply