Intel® oneAPI Data Parallel C++
Support for Intel® oneAPI DPC++ Compiler, Intel® oneAPI DPC++ Library, Intel ICX Compiler , Intel® DPC++ Compatibility Tool, and GDB*
561 Discussions

Access violation reading location 0x000001C29702D780

nnain1
New Contributor I
1,107 Views

I have  

Access violation reading location 0x000001C29702D780

   inside the follow part of the code, from using parallel_for_work_item.

cgh.parallel_for_work_group<class convl>(range<1>(N - (conv_kernel.get_count() - 1)), [=](group<1> g) {
            g.parallel_for_work_item([&](h_item<1> i) {
                // Cache the convolution kernel in local memory
                if (i.get_local_id(0) < conv_kernel_a.get_count())
                    a_conv_cache[i.get_local_id(0)] = conv_kernel_a[i.get_global_id()];
            });
            // An implicit barrier happens here
            g.parallel_for_work_item([&](h_item<1> i) {
                result_a[i.get_global_id()] = 0;
                // The convolution
                for (unsigned int j = 0; j < convL; ++j)
                    result_a[i.get_global_id()] += a_a[i.get_local_id(0) + j] * a_conv_cache;
            });

        });

The whole code is as follows. It was working before using beta-03. Now updated to beta-05 and this error comes out.
 

//==============================================================
// Copyright © 2019 Intel Corporation
//
// SPDX-License-Identifier: MIT
// =============================================================

#include <CL/sycl.hpp>

#include <array>
#include <iostream>
#include <CL/sycl/intel/fpga_extensions.hpp>
#include <numeric>
#include <chrono>

using namespace cl::sycl;
using namespace std;

constexpr size_t N = 1000;

using Type = int;

class custom_selector : public device_selector {
public:
    custom_selector() : device_selector() {}
    int operator()(const device& device) const override {
        string device_name = device.get_info<info::device::name>();
        auto type = device.get_info<info::device::device_type>();
        if (device_name.find("Intel") != string::npos) // Selecting Intel Device
            if (type == info::device_type::gpu)
                return 100;
        return -1;
    }
};

int main() {

    cpu_selector selector;
    //default_selector selector;
    //custom_selector selector;
    queue myQueue(selector);
    std::cout << "Device Name: " << myQueue.get_device().get_info<info::device::name>() << std::endl;

    const std::vector<Type> conv_kernel_init = { 1, 2, 4 };
    buffer<Type, 1> conv_kernel{ conv_kernel_init.begin(), conv_kernel_init.end() };
    // Use the fact it is initialized to 0 and just allocate what is needed
    const size_t M = N - (conv_kernel_init.size() - 1);
    const size_t convL = conv_kernel_init.size();
    buffer<Type, 1> gold_result(range<1>{M});
    auto t_start = std::chrono::high_resolution_clock::now();
    std::vector<int> data(N);
    std::iota(std::begin(data), std::end(data), 0);
    buffer<Type, 1> a(data.data(), range<1>{N});
    myQueue.submit([&](handler& cgh) {
        auto a_a = a.get_access<cl::sycl::access::mode::read>(cgh);
        auto conv_kernel_a = conv_kernel.get_access<cl::sycl::access::mode::read>(cgh);
        auto gold_result_a = gold_result.get_access<cl::sycl::access::mode::write>(cgh);
        cgh.parallel_for<class gold>(cl::sycl::range<1>{M}, [=](cl::sycl::id<1> idx) {
            for (unsigned int j = 0; j < convL; ++j)
                gold_result_a[idx] += a_a[idx + j] * conv_kernel_a;
            });
        });
    auto t_end = std::chrono::high_resolution_clock::now();
    double elapsed_time_ms = std::chrono::duration<double, std::milli>(t_end - t_start).count();
    std::cout << "CPU time " << elapsed_time_ms << std::endl;
    t_start = std::chrono::high_resolution_clock::now();
    queue myQueue_gpu(selector);
    buffer<Type> result{ M };
    myQueue_gpu.submit([&](handler& cgh) {
        auto a_a = a.get_access<cl::sycl::access::mode::read>(cgh);
        auto conv_kernel_a = conv_kernel.get_access<cl::sycl::access::mode::read>(cgh);
        // A cache conv_kernel in local memory
        accessor<decltype(conv_kernel_a)::value_type, 1, access::mode::read_write, access::target::local> a_conv_cache{ conv_kernel.get_count(), cgh };
        auto result_a = result.get_access<cl::sycl::access::mode::write>(cgh);
        //std::cerr << N - (conv_kernel.get_count() - 1) << std::endl;
        //work group parallel
        cgh.parallel_for_work_group<class convl>(range<1>(N - (conv_kernel.get_count() - 1)), [=](group<1> g) {
            g.parallel_for_work_item([&](h_item<1> i) {
                // Cache the convolution kernel in local memory
                if (i.get_local_id(0) < conv_kernel_a.get_count())
                    a_conv_cache[i.get_local_id(0)] = conv_kernel_a[i.get_global_id()];
            });
            // An implicit barrier happens here
            g.parallel_for_work_item([&](h_item<1> i) {
                result_a[i.get_global_id()] = 0;
                // The convolution
                for (unsigned int j = 0; j < convL; ++j)
                    result_a[i.get_global_id()] += a_a[i.get_local_id(0) + j] * a_conv_cache;
            });

        });
    });
    //t_end = std::chrono::high_resolution_clock::now();
    //elapsed_time_ms = std::chrono::duration<double, std::milli>(t_end - t_start).count();
    //std::cout << "GPU time " << elapsed_time_ms << std::endl;
    /*auto a_result = result.get_access<access::mode::read>();
    auto gold_result_chk = gold_result.get_access<access::mode::read>();
    //Verify the result
    for (unsigned int i = 0; i < M; ++i)
       if(gold_result_chk != a_result)
           std::cout << "The result not equal at: " << i << std::endl;*/
    return 0;
}

 

0 Kudos
8 Replies
GouthamK_Intel
Moderator
1,107 Views

Hi Nyan,

Thanks for reaching out to us!

Could you please provide more details about your environment: OS version. 

Please attach the screenshot of the error which you are facing. So, that we would be able to investigate more on your issue.

 

Regards

Goutham

0 Kudos
nnain1
New Contributor I
1,107 Views

My OS is Windows 10 and run the program in VS2019. Thank you.

0 Kudos
GouthamK_Intel
Moderator
1,107 Views

Hi Nyan,

We are able to reproduce the error which you are facing.

We will escalate this to the concerned team. 

 

Thanks

Goutham

0 Kudos
GouthamK_Intel
Moderator
1,107 Views

Hi Nyan,

We have escalated this to the concerned team. 

 

Regards

Goutham

0 Kudos
GouthamK_Intel
Moderator
1,107 Views

Hi Nyan,

Apologies for the delay in the response.

After debugging your code we found there is a syntax error, which resulted in "Access violation reading location".

Like parallel_for, parallel_for_work_group is a member function of the handler class and can only be called inside of a command-group scope. It is important to note that the ranges passed to the function specify the number of groups and optional group size, not the total number of work-items and group size as was the case for ND-range parallel_for.

In your code at line 76, we think that you are passing the total number of work-items as a parameter. Which resulted in creating (N-(conv_kernel.get_count() - 1)) number of work-groups with default work-group size, which is launching many work-items than needed and going out of the scope of your buffer. 

076 : cgh.parallel_for_work_group<class convl>(range<1>(N - (conv_kernel.get_count() - 1)), [=](group<1> g) {

Please try replacing the above line 76 with below lines in your code

range<1> num_groups(2);

range<1> group_size((N - (conv_kernel.get_count() - 1))/2);

cgh.parallel_for_work_group<class convl>(num_groups,group_size, [=](group<1> g) {

The above mentioned lines will launch a total (N - (conv_kernel.get_count() - 1)) number of work-items. i.e 998 work-items according to your code. 

 

Please let us know if you still face any issues in running the code. 

 

Regards

Goutham

0 Kudos
GouthamK_Intel
Moderator
1,107 Views

Hi Nyan,

Could you confirm if the issue is resolved?

Please let us know if you face any issues with code.

 

Regards

Goutham

 

 

0 Kudos
GouthamK_Intel
Moderator
1,107 Views

Hi Nyan,

Please let us know if you are still facing the issue. 

If your issue is resolved, please confirm whether we can close the thread. 

 

Thanks

Goutham

0 Kudos
GouthamK_Intel
Moderator
1,107 Views

Hi Nyan,

We are closing this thread.

Please feel free to raise a new thread in case of any further issues.

 

Regards

Goutham

0 Kudos
Reply