- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have
Access violation reading location 0x000001C29702D780
inside the follow part of the code, from using parallel_for_work_item.
cgh.parallel_for_work_group<class convl>(range<1>(N - (conv_kernel.get_count() - 1)), [=](group<1> g) { g.parallel_for_work_item([&](h_item<1> i) { // Cache the convolution kernel in local memory if (i.get_local_id(0) < conv_kernel_a.get_count()) a_conv_cache[i.get_local_id(0)] = conv_kernel_a[i.get_global_id()]; }); // An implicit barrier happens here g.parallel_for_work_item([&](h_item<1> i) { result_a[i.get_global_id()] = 0; // The convolution for (unsigned int j = 0; j < convL; ++j) result_a[i.get_global_id()] += a_a[i.get_local_id(0) + j] * a_conv_cache; }); });
The whole code is as follows. It was working before using beta-03. Now updated to beta-05 and this error comes out.
//============================================================== // Copyright © 2019 Intel Corporation // // SPDX-License-Identifier: MIT // ============================================================= #include <CL/sycl.hpp> #include <array> #include <iostream> #include <CL/sycl/intel/fpga_extensions.hpp> #include <numeric> #include <chrono> using namespace cl::sycl; using namespace std; constexpr size_t N = 1000; using Type = int; class custom_selector : public device_selector { public: custom_selector() : device_selector() {} int operator()(const device& device) const override { string device_name = device.get_info<info::device::name>(); auto type = device.get_info<info::device::device_type>(); if (device_name.find("Intel") != string::npos) // Selecting Intel Device if (type == info::device_type::gpu) return 100; return -1; } }; int main() { cpu_selector selector; //default_selector selector; //custom_selector selector; queue myQueue(selector); std::cout << "Device Name: " << myQueue.get_device().get_info<info::device::name>() << std::endl; const std::vector<Type> conv_kernel_init = { 1, 2, 4 }; buffer<Type, 1> conv_kernel{ conv_kernel_init.begin(), conv_kernel_init.end() }; // Use the fact it is initialized to 0 and just allocate what is needed const size_t M = N - (conv_kernel_init.size() - 1); const size_t convL = conv_kernel_init.size(); buffer<Type, 1> gold_result(range<1>{M}); auto t_start = std::chrono::high_resolution_clock::now(); std::vector<int> data(N); std::iota(std::begin(data), std::end(data), 0); buffer<Type, 1> a(data.data(), range<1>{N}); myQueue.submit([&](handler& cgh) { auto a_a = a.get_access<cl::sycl::access::mode::read>(cgh); auto conv_kernel_a = conv_kernel.get_access<cl::sycl::access::mode::read>(cgh); auto gold_result_a = gold_result.get_access<cl::sycl::access::mode::write>(cgh); cgh.parallel_for<class gold>(cl::sycl::range<1>{M}, [=](cl::sycl::id<1> idx) { for (unsigned int j = 0; j < convL; ++j) gold_result_a[idx] += a_a[idx + j] * conv_kernel_a; }); }); auto t_end = std::chrono::high_resolution_clock::now(); double elapsed_time_ms = std::chrono::duration<double, std::milli>(t_end - t_start).count(); std::cout << "CPU time " << elapsed_time_ms << std::endl; t_start = std::chrono::high_resolution_clock::now(); queue myQueue_gpu(selector); buffer<Type> result{ M }; myQueue_gpu.submit([&](handler& cgh) { auto a_a = a.get_access<cl::sycl::access::mode::read>(cgh); auto conv_kernel_a = conv_kernel.get_access<cl::sycl::access::mode::read>(cgh); // A cache conv_kernel in local memory accessor<decltype(conv_kernel_a)::value_type, 1, access::mode::read_write, access::target::local> a_conv_cache{ conv_kernel.get_count(), cgh }; auto result_a = result.get_access<cl::sycl::access::mode::write>(cgh); //std::cerr << N - (conv_kernel.get_count() - 1) << std::endl; //work group parallel cgh.parallel_for_work_group<class convl>(range<1>(N - (conv_kernel.get_count() - 1)), [=](group<1> g) { g.parallel_for_work_item([&](h_item<1> i) { // Cache the convolution kernel in local memory if (i.get_local_id(0) < conv_kernel_a.get_count()) a_conv_cache[i.get_local_id(0)] = conv_kernel_a[i.get_global_id()]; }); // An implicit barrier happens here g.parallel_for_work_item([&](h_item<1> i) { result_a[i.get_global_id()] = 0; // The convolution for (unsigned int j = 0; j < convL; ++j) result_a[i.get_global_id()] += a_a[i.get_local_id(0) + j] * a_conv_cache ; }); }); }); //t_end = std::chrono::high_resolution_clock::now(); //elapsed_time_ms = std::chrono::duration<double, std::milli>(t_end - t_start).count(); //std::cout << "GPU time " << elapsed_time_ms << std::endl; /*auto a_result = result.get_access<access::mode::read>(); auto gold_result_chk = gold_result.get_access<access::mode::read>(); //Verify the result for (unsigned int i = 0; i < M; ++i) if(gold_result_chk != a_result) std::cout << "The result not equal at: " << i << std::endl;*/ return 0; }
- Tags:
- General Support
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nyan,
Thanks for reaching out to us!
Could you please provide more details about your environment: OS version.
Please attach the screenshot of the error which you are facing. So, that we would be able to investigate more on your issue.
Regards
Goutham
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My OS is Windows 10 and run the program in VS2019. Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nyan,
We are able to reproduce the error which you are facing.
We will escalate this to the concerned team.
Thanks
Goutham
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nyan,
We have escalated this to the concerned team.
Regards
Goutham
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nyan,
Apologies for the delay in the response.
After debugging your code we found there is a syntax error, which resulted in "Access violation reading location".
Like parallel_for, parallel_for_work_group is a member function of the handler class and can only be called inside of a command-group scope. It is important to note that the ranges passed to the function specify the number of groups and optional group size, not the total number of work-items and group size as was the case for ND-range parallel_for.
In your code at line 76, we think that you are passing the total number of work-items as a parameter. Which resulted in creating (N-(conv_kernel.get_count() - 1)) number of work-groups with default work-group size, which is launching many work-items than needed and going out of the scope of your buffer.
076 : cgh.parallel_for_work_group<class convl>(range<1>(N - (conv_kernel.get_count() - 1)), [=](group<1> g) {
Please try replacing the above line 76 with below lines in your code
range<1> num_groups(2);
range<1> group_size((N - (conv_kernel.get_count() - 1))/2);
cgh.parallel_for_work_group<class convl>(num_groups,group_size, [=](group<1> g) {
The above mentioned lines will launch a total (N - (conv_kernel.get_count() - 1)) number of work-items. i.e 998 work-items according to your code.
Please let us know if you still face any issues in running the code.
Regards
Goutham
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nyan,
Could you confirm if the issue is resolved?
Please let us know if you face any issues with code.
Regards
Goutham
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nyan,
Please let us know if you are still facing the issue.
If your issue is resolved, please confirm whether we can close the thread.
Thanks
Goutham
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nyan,
We are closing this thread.
Please feel free to raise a new thread in case of any further issues.
Regards
Goutham

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page