As far as I know there is no way to do this in OpenCL. While there are many system queries you can do during initialization to learn about the type of system the code is running on, I have not heard of any way that a kernel can learn about the ID if a specific EU it is running on.
What is your final goal? Perhaps if we knew more about that we could provide more options.
Thanks for your prompt reply.
The final goal for me is that I would like to figure out how the hardware scheduler schedules the work groups to EUs. If the kernel can read the EU IDs during execution, probably that will solve my problem.
In this afternoon, I found that register SR0 contains EUIDs. If I get the address of that register and inline assembly code into the kernel by __asm__(), possibly that can do it. Not sure yet, I will try to find out the address of SR0 and then see whether this works.
Thanks! Other options and thoughts about this are welcome!
Knowing the EU and hardware thread IDs would be a useful feature. It's probably way out of spec for current revisions of OpenCL but might be an easy intel_cl extension?
Note this can be done in CUDA by reading the %smid% special register (but the IDs aren't always sequential).
My observations so far lead me to believe that a GEN workgroup has its subgroups assigned across each EU in the subslice.
That is, if you have a workgroup with 8 subgroups on a Gen8+ IGP then one subgroup will be assigned to each EU in the subslice.
Your mileage may vary.
If you just want controlled scheduling of work onto a partial subslice, then a workaround for what you're asking for might be to implement your kernel with a "grid-stride loop". Note that for some strange reason, a subslice has more total EU threads than you can possibly cover with a single workgroup so you may have to use two workgroups to fully cover a subslice.