OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.

Best practices for handling variable width sub_group sizes?

allanmac1
Beginner
964 Views

I wrote the following back in May 2015:

It would be great to get some sort of compile-time indication or guarantee that a certain size subgroup was selected in order to close off certain code paths.

If only the compiler can determine the subgroup size and your codebase is dependent on a certain subgroup size and variations in the code result in changes to the subgroup size then ... it becomes a circular mess.

There is always the option of probing the subgroup size at runtime with get_sub_group_size() but I would caution against that approach.  It may be portable but it's probably not performant.

But if get_sub_group_size() was a compile time constant resolved early enough to snip dead code then it would provide portability and performance.

It would be great to hear how Intel actually implements this super useful extension.

Is there any new info on how and when sub_group sizes are determined in Intel Processor Graphics OpenCL and whether or not a sub_group size can ever be an optimizable compile time constant (maybe it can't if get_sub_group_size() and get_max_sub_group_size() are not always equal)?

As implied above, Intel GEN appears to be unique in allowing the compiler to determine the "warp" (SIMD) width at compile time.

A small change to a kernel might result in switching from SIMD8 to SIMD16 sub groups.

My experience so far is that this flexibility forces a defensive assumption that SIMD8 is the widest sub_group that can be relied upon and that various work item id calculations have to be performed explicitly: e.g. "get_local_id(0) & 7" instead of "get_sub_group_local_id()".

Some tips and tricks on this subject would be appreciated!

 

0 Kudos
2 Replies
Timothy_B_Intel
Employee
964 Views

Indeed.

https://www.khronos.org/registry/cl/extensions/intel/cl_intel_required_subgroup_size.txt

There is a minimum HW requirement, so I am not sure which platform it becomes available.

 

0 Kudos
allanmac1
Beginner
964 Views

Aww yiss!  That looks perfect.

 

0 Kudos
Reply