Migrating to SYCL
One-stop forum for getting assistance migrating your existing code to SYCL
44 Discussions

Inquiry Regarding Inconsistent Results Despite [[intel::reqd_sub_group_size(32)]] Specification

-Light-
Novice
200 Views

I recently migrated a CUDA project to SYCL and encountered different results between debug mode and release mode when running in Visual Studio. After investigating, I found that the difference occurs in the "get_sub_group()" function.

Here's a snippet of code I used for testing:


std::cout << "device name : " << device.get_name() << std::endl;//device name: Intel(R) Arc(TM) A370M Graphics
std::cout << "Suppose Sub-group Sizes: ";
for (const auto& s : dev_ct1.get_info<sycl::info::d evice::sub_group_sizes>()) {
std::cout << s << “ ”;
}
std::cout << std::endl;//Suppose Sub-group Sizes: 8 16 32

sycl::queue& q = dev_ct1.in_order_queue();
q.submit([&](sycl::handler& cgh) {
sycl::stream out(1024 * 1024, 256, cgh);
cgh.parallel_for(
sycl::nd_range<3>(sycl::range<3>(1, 1, 32) *
sycl::range<3>(1, 1, 256),
sycl::range<3>(1, 1, 256)),
[=](sycl::nd_item<3> item_ct1)
[[intel::reqd_sub_group_size(32)]] {
out << "Used Sub-group Sizes: " << item_ct1.get_sub_group().get_local_range() << sycl::endl; });

});
});

When running in debug mode (without code optimization), the output is 16. However, when running in release mode (code optimization level of O1 or O2), the output is 32.

Although the desired subgroup size is set to 32 using [intel::reqd_sub_group_size(32)], the output still differs between debug and release modes.

Thank you for your help.

Sincerely

0 Kudos
0 Replies
Reply