- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I recently migrated a CUDA project to SYCL and encountered different results between debug mode and release mode when running in Visual Studio. After investigating, I found that the difference occurs in the "get_sub_group()" function.
Here's a snippet of code I used for testing:
std::cout << "device name : " << device.get_name() << std::endl;//device name: Intel(R) Arc(TM) A370M Graphics
std::cout << "Suppose Sub-group Sizes: ";
for (const auto& s : dev_ct1.get_info<sycl::info::d evice::sub_group_sizes>()) {
std::cout << s << “ ”;
}
std::cout << std::endl;//Suppose Sub-group Sizes: 8 16 32
sycl::queue& q = dev_ct1.in_order_queue();
q.submit([&](sycl::handler& cgh) {
sycl::stream out(1024 * 1024, 256, cgh);
cgh.parallel_for(
sycl::nd_range<3>(sycl::range<3>(1, 1, 32) *
sycl::range<3>(1, 1, 256),
sycl::range<3>(1, 1, 256)),
[=](sycl::nd_item<3> item_ct1)
[[intel::reqd_sub_group_size(32)]] {
out << "Used Sub-group Sizes: " << item_ct1.get_sub_group().get_local_range() << sycl::endl; });
});
});
When running in debug mode (without code optimization), the output is 16. However, when running in release mode (code optimization level of O1 or O2), the output is 32.
Although the desired subgroup size is set to 32 using [intel::reqd_sub_group_size(32)], the output still differs between debug and release modes.
Thank you for your help.
Sincerely
Link Copied
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page