CPU as OpenCL device running in a sperated process?

Harald_S_ — Wed, 31 Aug 2016 09:41:07 GMT

I run a program (process A) on my Intel Xeon CPU E5-1620 v2 with two threads. One thread (1) starts an OpenCL application, that uses the CPU as device the other (2) does some calculations.

I noticed that the performance of thread 2, suffers from the OpenCL application execution of thread 1.

So I concluded, that the OpenCL application run by thread 1 starts a new process on the CPU (process B) and that process A and B get scheduled by the operating system. Because of this the performance of thread 2 suffers.

I could not find any documentation, that confirms my conclusion.

Is conclusion correct and more important, is there a documentation about it?

The CPU implementation is

Jeffrey_M_Intel1 — Sun, 04 Sep 2016 20:54:48 GMT

The CPU implementation is automatically parallelized by Intel Threading Building Blocks (TBB). This is one of the advantages of using it -- you get access to the sophisticated multi-threading capabilities of this rich library for free.

If you run the CapsBasic sample (platform/device capabilities viewer) you will see something like this for your OpenCL CPU implementation:

CL_DEVICE_TYPE_CPU[0]
CL_DEVICE_NAME: Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz
CL_DEVICE_AVAILABLE: 1
...
CL_DEVICE_MAX_COMPUTE_UNITS: 4

For this processor, it means OpenCL will schedule across the 4 CPU cores by default.

For the CPU implementation it is possible to use only a subset of cores through "device fission". https://software.intel.com/en-us/articles/opencl-device-fission-for-cpu-performance.

Of course another option to have more control over which cores are used is to just move the kernel code into TBB or OpenMP instead.

topic The CPU implementation is in OpenCL* for CPU

CPU as OpenCL device running in a sperated process?

The CPU implementation is