Best practices for handling variable width sub_group sizes?

allanmac1 · ‎08-24-2016

I wrote the following back in May 2015:

It would be great to get some sort of compile-time indication or guarantee that a certain size subgroup was selected in order to close off certain code paths.

If only the compiler can determine the subgroup size and your codebase is dependent on a certain subgroup size and variations in the code result in changes to the subgroup size then ... it becomes a circular mess.

There is always the option of probing the subgroup size at runtime with get_sub_group_size() but I would caution against that approach. It may be portable but it's probably not performant.

But if get_sub_group_size() was a compile time constant resolved early enough to snip dead code then it would provide portability and performance.

It would be great to hear how Intel actually implements this super useful extension.

Is there any new info on how and when sub_group sizes are determined in Intel Processor Graphics OpenCL and whether or not a sub_group size can ever be an optimizable compile time constant (maybe it can't if get_sub_group_size() and get_max_sub_group_size() are not always equal)?

As implied above, Intel GEN appears to be unique in allowing the compiler to determine the "warp" (SIMD) width at compile time.

A small change to a kernel might result in switching from SIMD8 to SIMD16 sub groups.

My experience so far is that this flexibility forces a defensive assumption that SIMD8 is the widest sub_group that can be relied upon and that various work item id calculations have to be performed explicitly: e.g. "get_local_id(0) & 7" instead of "get_sub_group_local_id()".

Some tips and tricks on this subject would be appreciated!

Timothy_B_Intel · ‎08-29-2016

Indeed.

https://www.khronos.org/registry/cl/extensions/intel/cl_intel_required_subgroup_size.txt

There is a minimum HW requirement, so I am not sure which platform it becomes available.

allanmac1 · ‎08-29-2016

Aww yiss! That looks perfect.