- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I run a program (process A) on my Intel Xeon CPU E5-1620 v2 with two threads. One thread (1) starts an OpenCL application, that uses the CPU as device the other (2) does some calculations.
I noticed that the performance of thread 2, suffers from the OpenCL application execution of thread 1.
So I concluded, that the OpenCL application run by thread 1 starts a new process on the CPU (process B) and that process A and B get scheduled by the operating system. Because of this the performance of thread 2 suffers.
I could not find any documentation, that confirms my conclusion.
Is conclusion correct and more important, is there a documentation about it?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The CPU implementation is automatically parallelized by Intel Threading Building Blocks (TBB). This is one of the advantages of using it -- you get access to the sophisticated multi-threading capabilities of this rich library for free.
If you run the CapsBasic sample (platform/device capabilities viewer) you will see something like this for your OpenCL CPU implementation:
CL_DEVICE_TYPE_CPU[0]
CL_DEVICE_NAME: Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz
CL_DEVICE_AVAILABLE: 1
...
CL_DEVICE_MAX_COMPUTE_UNITS: 4
For this processor, it means OpenCL will schedule across the 4 CPU cores by default.
For the CPU implementation it is possible to use only a subset of cores through "device fission". https://software.intel.com/en-us/articles/opencl-device-fission-for-cpu-performance.
Of course another option to have more control over which cores are used is to just move the kernel code into TBB or OpenMP instead.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page