My suite of kernels compiled (to binaries) with the .6094 driver on Win10/x64 take almost twice the amount of time to execute as those compiled with .6025.
Compiling on .6025 and executing on .6094 shows no regression.
Compiling on .6094 and executing on .6094 or .6025 shows the huge performance drop.
Inspection of the .6094 produced assembly shows long sequences of MOV operations that I believe are unnecessary.
I wish there was a better way to report performance regressions (and reproducers) than here or the GitHub issues page (which is very quiet).
The .6136 driver is also spilling a LOT of registers on kernels that compile without any spills on .6025.
I'm using __attribute__((intel_reqd_sub_group_size(8))) so there should be plenty of registers.
Today's new driver (.6194) exhibits the same regression.
As noted above, .6025 works perfectly.
All kernels listed here are decorated with a reqd_subroup_size(8).
Let me know who I can send the kernels to.
Thanks for the detail. If your reproducer source is privileged, it can be submitted confidentially through the Intel Service Center. I can route it to the devs from there. For OpenCL, I recommend marking it as Media Server Studio/Media SDK related or Intel System Studio related.