OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1663 Discussions

Concurrent Kernel Execution

Lingzhi_S_
Beginner
161 Views

Can host threads execute kernels concurrently with intel sdk for opencl?  I heard that kernels(commands) from different command-queues will be executed concurrently on the device. Is that true? And, is  "Device Fission" supported on GPU with Intel opencl driver now? That may be another way to implement it. 

I use: Intel Core i7, Intel HD Graphics 4600, Intel sdk for opencl. 

THX,

Lingzhi

0 Kudos
5 Replies
Robert_I_Intel
Employee
161 Views

Lingzhi,

Kernels cannot be executed concurrently on the GPU device using current production drivers. The Device Fission feature is available on the OpenCL CPU device only: see https://software.intel.com/en-us/articles/opencl-device-fission-for-cpu-performance.

Robert

allanmac1
Beginner
161 Views

Will support for concurrent kernels be added to the IGP drivers at some point?  

 

Robert_I_Intel
Employee
161 Views

allanmac,

We are collecting the requirements and use cases for the concurrent kernel execution. Please let me know what they are and I will forward it to our product team. They are hesitant to add that functionality at the moment due to lack of demand and realistic use cases.

allanmac1
Beginner
161 Views

OK.  Here's my use case:

I have an advanced pipeline of kernels that are designed to run concurrently.  Inter-kernel dependencies are currently managed by the kernel launching logic and kernel-completion callbacks but at some point I may dump this work onto the OpenCL event system if it further reduces system latency.

Some of the kernels are computationally intense.  Others are not.  All run for short durations (from microseconds to at most a few milliseconds).

I don't care about presenting enough work to the IGP for it to reach its peak clock speed since I always have the option to make that happen by queuing up more work for the IGP.

But I do care about latency... which is why I really want concurrent kernels.

---

That being said, I understand why the smaller IGPs probably aren't going to benefit much from concurrent kernel execution.

But a double or triple-slice IGP seems like it would be a good environment for concurrent kernel execution. :)

 

Robert_I_Intel
Employee
161 Views

allanmac,

In the short term, we have nested parallelism in OpenCL 2.0 (kernels launching other kernels), which should improve latency situation. For more on nested parallelism, see my article https://software.intel.com/en-us/articles/gpu-quicksort-in-opencl-20-using-nested-parallelism-and-work-group-scan-functions 

You can also watch short videos on nested parallelism here:

 

 

I will forward your input to our product team.

Reply