I am experiencing performance issues with Intel OpenCL drivers when calling clCreateKernel().
Attached you can find a test code (using Boost Compute) with reference values taken from various OpenCL devices.
Here, the Intel drivers show by far the worst results by a factor of 10 to 600 compared to other vendors.
Please let me know which further details you need, or where I can file a bug report, to get this fixed.
On the GPU side at least, I think we've fixed this problem with our latest internal drivers. This won't help you right now since the optimization isn't in our latest public drivers, but it will be in the latest major driver release - stay tuned.
In the meantime, recent public drivers for recent GPUs do have an optimization for clCreateKernel, but it requires a slightly different pattern than the one used by your app. Basically, if you measure ( clCreateKernel + clReleaseKernel ) x 10000, vs. just clCreateKernel x 10000, you should see better performance. This is a pattern that we've seen used by OpenCV, for example.
Note that to see an improvement you may need newer drivers than the ones in your report, which are a bit old. I tested on an HD Graphics 5500 device with drivers 22.214.171.12463, if that helps.
Thank you for the information regarding the GPU drivers.
How about the CPU side? I am actually more interested in this. Especially the Xeon did show really bad results on this end, compared for example to open source CPU backend Oclgrind.
So what is the status / roadmap on this issue for the CPU?
My kernel in question is spending more than 10% of its execution time in the clCreateKernel function on the Xeon CPU with the current driver version.