Intel® oneAPI Base Toolkit
Support for the core tools and libraries within the base toolkit that are used to build and deploy high-performance data-centric applications.

DPC++ subgroup size questions

LaurentPlagne
Novice
1,836 Views

Hi,

In the sub_group training notebook on devcloud, I tried different types for the collective reduction (sum).

  • I wonder why the sub_group size does not depend on the data element type (it depends on the device). I have the same sg size for both float and double. 
  • I also wonder about the syntax : get_group_range() returns a number while get_local_range() does not. 

Thank you for your help.

Laurent

 

 

0 Kudos
1 Solution
RahulV_intel
Moderator
1,818 Views

Hi @LaurentPlagne ,

 

1. Subgroup size is device specific. For instance on iGPU, the SIMD width can be 8/16/32, which means 8/16/32 work-items can fit on a single hardware thread of Execution Unit(EU). These hardware threads utilize register space of hardware thread (specific to device). As long as your subgroup size fits within the register space, it doesn't matter whether you use int or float.

To check supported subgroup size or any other info related to your device, you may run 'clinfo' command in your terminal.

For more info, kindly refer to this link: https://software.intel.com/content/www/us/en/develop/articles/sgemm-for-intel-processor-graphics.html#:~:text=The%20size%20of%20subgroup%20is,the%20Execution%20Unit%20(EU).

 

2. The get_group_range() function returns the number of workgroups in a kernel (all dimensions or single dimension depending on the function call invoked).

For example, lets say we have a kernel with 1024 workitems in each workgroup and we have 4 such workgroups. In this case, get_group_range() function would return a range(4) object.

Consider a scenario in which you have workitems in all three dimensions (say 64 in x, y and z dimensions). If you call get_local_range() here that would return range(64,64,64) object. If you wish to retrieve number of workitems in a workgroup in a specific dimension(lets say x dim), then you can call this function get_local_range(0) (arg can be 1 or 2 for other dimensions).

The same concept applies in the case of ND-range worgroups.

To answer your question, get_group_range or get_local_range doesn't necessarily need to return an integer. The return type could be a range object as well. Both get_group_range() and get_local_range() are overloaded functions as seen below.

range get_local_range()const : Return a SYCL range representing all dimensions of the local range. This local range may have been provided by the programmer, or chosen by the SYCL runtime.

size_t get_local_range(int dimension)const :  Return the dimension of the local range specified by the dimension parameter.

range get_group_range()const : Return a range representing the number of work-groups in the nd_range.

size_t get_group_range(int dimension)const : Return element dimension from the constituent group range

 

For more information, kindly refer to the Khronos SYCL specs 1.2.1

 

Hope this helps.

 

--Rahul

View solution in original post

5 Replies
RahulV_intel
Moderator
1,819 Views

Hi @LaurentPlagne ,

 

1. Subgroup size is device specific. For instance on iGPU, the SIMD width can be 8/16/32, which means 8/16/32 work-items can fit on a single hardware thread of Execution Unit(EU). These hardware threads utilize register space of hardware thread (specific to device). As long as your subgroup size fits within the register space, it doesn't matter whether you use int or float.

To check supported subgroup size or any other info related to your device, you may run 'clinfo' command in your terminal.

For more info, kindly refer to this link: https://software.intel.com/content/www/us/en/develop/articles/sgemm-for-intel-processor-graphics.html#:~:text=The%20size%20of%20subgroup%20is,the%20Execution%20Unit%20(EU).

 

2. The get_group_range() function returns the number of workgroups in a kernel (all dimensions or single dimension depending on the function call invoked).

For example, lets say we have a kernel with 1024 workitems in each workgroup and we have 4 such workgroups. In this case, get_group_range() function would return a range(4) object.

Consider a scenario in which you have workitems in all three dimensions (say 64 in x, y and z dimensions). If you call get_local_range() here that would return range(64,64,64) object. If you wish to retrieve number of workitems in a workgroup in a specific dimension(lets say x dim), then you can call this function get_local_range(0) (arg can be 1 or 2 for other dimensions).

The same concept applies in the case of ND-range worgroups.

To answer your question, get_group_range or get_local_range doesn't necessarily need to return an integer. The return type could be a range object as well. Both get_group_range() and get_local_range() are overloaded functions as seen below.

range get_local_range()const : Return a SYCL range representing all dimensions of the local range. This local range may have been provided by the programmer, or chosen by the SYCL runtime.

size_t get_local_range(int dimension)const :  Return the dimension of the local range specified by the dimension parameter.

range get_group_range()const : Return a range representing the number of work-groups in the nd_range.

size_t get_group_range(int dimension)const : Return element dimension from the constituent group range

 

For more information, kindly refer to the Khronos SYCL specs 1.2.1

 

Hope this helps.

 

--Rahul

LaurentPlagne
Novice
1,810 Views
0 Kudos
RahulV_intel
Moderator
1,788 Views

No problem, @LaurentPlagne .

 

Could you let me know if I can close this thread from my end?

 

--Rahul

 

0 Kudos
LaurentPlagne
Novice
1,783 Views

Hi Rahul,

 

the thread can be closed (I cross the solution box on your previous reply).

Thank you again.

0 Kudos
RahulV_intel
Moderator
1,773 Views

Thanks for the confirmation.


Intel will no longer monitor this thread. However, this thread will remain open for community participation.


0 Kudos
Reply