Thanks for taking a look...

allanmac1 · ‎01-06-2017

It seems to me that GEN might benefit more from detecting "subgroup uniform" values than other architectures because of its unique register file architecture and instruction set.

Are there are any GEN idioms that you've discovered that nudge/help the compiler so it's able to determine that variables used by a subgroup are actually scalars (subgroup uniform)?

For example, would an idiom like this:

kernel foo(...)
{
  uint const sg_id = get_sub_group_id();

  if (sg_id == get_sub_group_id())
  {
    // rest of kernel
  }
}

or an idiom like this (perhaps better):

kernel foo(...)
{
  if (sub_group_all(true))
  {
    // rest of kernel
  }
}

... help the compiler determine that the subgroups are running "in isolation" and therefore any future function involving get_sub_group_id() (or similar) would be uniform?

I suspect this hasn't been implemented but it might be a useful idiom for both performance and reducing register pressure.

Jeffrey_M_Intel1 · ‎01-10-2017

So far I have not been able to find anything that exactly fits. However, we will keep this in mind for future documentation and features.

For now, would it help at all to set up "subgroup uniform" values using SLM or possibly images to take advantage of hardware shared within subgroups?

allanmac1 · ‎01-11-2017

Thanks for taking a look...

My workaround is to simply launch subgroup-wide workgroups (in this case 8 item workgroups).

That works really well on Skylake... but this might not be a long term solution and because of the local mem granularity rules, I'm unable to exploit all 64KB of local mem per subslice.

I would rather launch two workgroups with 28 SIMD8 subgroups each and have each subgroup obtain access to ~1700 bytes of local memory and let each subgroup run independently.

Bouncing data through SLM to help indicate uniformity is an option but I still think the code generation couldn't possibly be as good as actually knowing that a sequence is subgroup isolated.

You could always provide us a GEN assembler! :)

Is there any GEN-friendly idiom for communicating subgroup uniformity?