The CPU implementation is

Harald_S_ · ‎08-31-2016

I run a program (process A) on my Intel Xeon CPU E5-1620 v2 with two threads. One thread (1) starts an OpenCL application, that uses the CPU as device the other (2) does some calculations.

I noticed that the performance of thread 2, suffers from the OpenCL application execution of thread 1.

So I concluded, that the OpenCL application run by thread 1 starts a new process on the CPU (process B) and that process A and B get scheduled by the operating system. Because of this the performance of thread 2 suffers.

I could not find any documentation, that confirms my conclusion.

Is conclusion correct and more important, is there a documentation about it?

Jeffrey_M_Intel1 · ‎09-04-2016

The CPU implementation is automatically parallelized by Intel Threading Building Blocks (TBB). This is one of the advantages of using it -- you get access to the sophisticated multi-threading capabilities of this rich library for free.

If you run the CapsBasic sample (platform/device capabilities viewer) you will see something like this for your OpenCL CPU implementation:

CL_DEVICE_TYPE_CPU[0]
CL_DEVICE_NAME: Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz
CL_DEVICE_AVAILABLE: 1
...
CL_DEVICE_MAX_COMPUTE_UNITS: 4

For this processor, it means OpenCL will schedule across the 4 CPU cores by default.

For the CPU implementation it is possible to use only a subset of cores through "device fission". https://software.intel.com/en-us/articles/opencl-device-fission-for-cpu-performance.

Of course another option to have more control over which cores are used is to just move the kernel code into TBB or OpenMP instead.

CPU as OpenCL device running in a sperated process?