Hello,
While the focus of many materials about OpenCL is the data parallel programming model, it's important to keep in mind that OpenCL also supports a task-parallel programming model, which is more oriented at compute devices with fewer compute units that are relatively strong.
That being said, we believe the Intel OpenCL SDK for the CPU can offer good performance on the CPU for data parallel workloads as well. Probably the best way is for you to try it out yourself to see which of the Intel solutions fits your needs the best.
To answer your specific question, the thread launch time between OpenMP and Intel's OpenCL should be quite similar, but you probably meant to ask about execution overhead and not the actual thread launch. OpenCL semantics require error checking and other overheads, that aren't present in OpenMP. However, these do not scale with the size of the workload, but rather are constant per call to clEnqueueNDRange.
You can measure exactly how big a penalty is incurred by comparing a wallclock measurement to the execution time reported by the kernel object via the clGetEventProfilingInfo API - just make sure the wallclock measurement captures all the execution, since clEnqueueNDRange is asynchronous.
For more information, you can also check the optimization guide.
Thanks,
Doron Singer