Here's a pie-in-the-sky request for enhancement... :)
It would be useful if we could create OpenCL/SPIR-* kernels that mapped one workitem to a single EU hardware thread and exposed the thread's entire 4KB/thread GRF.
Add to this a "Gen Native" OpenCL extension that exposed register-regionable explicit SIMD operations.
It's not portable but Intel GEN is shipping 100's of millions (?) of very capable Gen IGPs per year so why not expose a little more of the architecture's power since Gen is close to being ubiquitous.
Without this it feels like Gen is burying one of its most unique and powerful features.
With this "native" kernel type, developers could focus on how to explicitly utilize the gigantic GRF: