Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
15 Views

How to figure out the work group to EU mapping?

Hi everyone:

 

      Is there a way to figure out the work group to EU mapping? More specifically, is there any registers containing EU IDs available in EU, that the kernel can read  during the execution?

      Thanks!

Dong

 

0 Kudos
3 Replies
Highlighted
15 Views

As far as I know there is no way to do this in OpenCL.  While there are many system queries you can do during initialization to learn about the type of system the code is running on, I have not heard of any way that a kernel can learn about the ID if a specific EU it is running on. 

What is your final goal?  Perhaps if we knew more about that we could provide more options.

0 Kudos
Highlighted
Beginner
15 Views

Hi Jeffrey:

     Thanks for your prompt reply.

     The final goal for me is that I would like to figure out how the hardware scheduler schedules the work groups to EUs. If the kernel can read the EU IDs during execution, probably that will solve my problem. 

     In this afternoon, I found that register SR0 contains EUIDs. If I get the address of that register and inline assembly code into the kernel by __asm__(), possibly that can do it. Not sure yet, I will try to find out the address of SR0 and then see whether this works.

    Thanks! Other options and thoughts about this are welcome!

Dong 

 

0 Kudos
Highlighted
Beginner
15 Views

Knowing the EU and hardware thread IDs would be a useful feature.  It's probably way out of spec for current revisions of OpenCL but might be an easy intel_cl extension?

Note this can be done in CUDA by reading the %smid% special register (but the IDs aren't always sequential).

My observations so far lead me to believe that a GEN workgroup has its subgroups assigned across each EU in the subslice.

That is, if you have a workgroup with 8 subgroups on a Gen8+ IGP then one subgroup will be assigned to each EU in the subslice.

Your mileage may vary.

If you just want controlled scheduling of work onto a partial subslice, then a workaround for what you're asking for might be to implement your kernel with a "grid-stride loop".  Note that for some strange reason, a subslice has more total EU threads than you can possibly cover with a single workgroup so you may have to use two workgroups to fully cover a subslice.

 

0 Kudos