- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can host threads execute kernels concurrently with intel sdk for opencl? I heard that kernels(commands) from different command-queues will be executed concurrently on the device. Is that true? And, is "Device Fission" supported on GPU with Intel opencl driver now? That may be another way to implement it.
I use: Intel Core i7, Intel HD Graphics 4600, Intel sdk for opencl.
THX,
Lingzhi
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Lingzhi,
Kernels cannot be executed concurrently on the GPU device using current production drivers. The Device Fission feature is available on the OpenCL CPU device only: see https://software.intel.com/en-us/articles/opencl-device-fission-for-cpu-performance.
Robert
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Will support for concurrent kernels be added to the IGP drivers at some point?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
allanmac,
We are collecting the requirements and use cases for the concurrent kernel execution. Please let me know what they are and I will forward it to our product team. They are hesitant to add that functionality at the moment due to lack of demand and realistic use cases.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK. Here's my use case:
I have an advanced pipeline of kernels that are designed to run concurrently. Inter-kernel dependencies are currently managed by the kernel launching logic and kernel-completion callbacks but at some point I may dump this work onto the OpenCL event system if it further reduces system latency.
Some of the kernels are computationally intense. Others are not. All run for short durations (from microseconds to at most a few milliseconds).
I don't care about presenting enough work to the IGP for it to reach its peak clock speed since I always have the option to make that happen by queuing up more work for the IGP.
But I do care about latency... which is why I really want concurrent kernels.
---
That being said, I understand why the smaller IGPs probably aren't going to benefit much from concurrent kernel execution.
But a double or triple-slice IGP seems like it would be a good environment for concurrent kernel execution. :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
allanmac,
In the short term, we have nested parallelism in OpenCL 2.0 (kernels launching other kernels), which should improve latency situation. For more on nested parallelism, see my article https://software.intel.com/en-us/articles/gpu-quicksort-in-opencl-20-using-nested-parallelism-and-work-group-scan-functions
You can also watch short videos on nested parallelism here:
- https://software.intel.com/en-us/videos/implementing-sierpi-ski-carpet-in-opencl-20
- https://software.intel.com/en-us/videos/gpu-quicksort-in-opencl-20
I will forward your input to our product team.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page