OpenCL on multicore processor

NMark3 · ‎04-09-2018

Hi everyone!

I'm new here and my first question is very basic. I was not able to find the clear answer anywhere. I have some knowledge in working with OpenCL on FPGA platform, but now I'm trying to implement the same algorithm on the CPU platform (Core).

My question is: Can I implement an OpenCL application only on CPU cores (without using GPUs)? My goal is to run an application similar to multithreaded concept, but using OpenCL. In this configuration, one CPU core should be the host, and other cores should be devices. Is this possible at all? Or is OpenCL based on the concept: CPU acts as host and GPU cores act as devices, and it cannot function in any other way?

Thanks in advance!

Michael_C_Intel1 · ‎04-10-2018

Hi Opcopc,

My question is: Can I implement an OpenCL application only on CPU cores (without using GPUs)

A: Yes.

In this configuration, one CPU core should be the host, and other cores should be devices. Is this possible at all?

A. The best answer is "kind of" or it depends. If you dump a clinfo application type of output against an Intel based OCL implementation, you'll see each core mapped to an OCL execution unit. The entire CPU will present itself as one OCL device. If you have a host thread active while kernels are active, a naive OCL runtime will gang schedule over the "Host" core.

In your ndrange launch for your kernel, be cognizant of how many kernel instances you are launching at the same time if you are concerned about residency. I can't think of an immediate mechanism to hard affinitize given the runtimes. However, if folks like yourself started commenting on this thread for use cases where it would help, i would put together a feature request recommendation for OCL dev teams.

The simple approach could be to launch cores - 1 work items and observe compute residency via some CPU utilization monitor.... you could then hard affinitize your "host" thread to the idle core.

I recommend looking at these clGetDeviceInfo parameters:

CL_DEVICE_MAX_COMPUTE_UNITS

CL_DEVICE_MAX_SAMPLERS

CL_DEVICE_MAX_WORK_GROUP_SIZE

CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS

CL_DEVICE_MAX_WORK_ITEM_SIZES

Separately, you could set up more than one context to an OCL device. This might be somewhat unwieldy to manage.

One of the major goals of Intel OCL CPU runtime implementations is bring up and modeling of an application. For porting from FPGA to CPU, developers need to be cognizant of any FPGA device specific extensions and perhaps remove them from kernels.

The most consistent place to get an Intel OCL for CPU implementation is the Intel® SDK for OpenCL™ Applications (portal linked here).

Note: Currently, The SDK has an implementation in the subpackage intel-sdk-opencl-codebuilder-intel-cpu-exp-2017.0-7.0.0.2568.x86_64.rpm, SRB5.0 for linux has one in intel-opencl-cpu-r5.0-63503.x86_64 subpackage.