OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1663 Discussions

How to know the number of compute units used when using a CPU as an OpenCL device

LSolis
Beginner
578 Views

I am running a program using Intel OpenCL 1.2. My OpenCL device is a CPU:

[lvs@eredmithrim CapsBasic]$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 94
Model name:            Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz
Stepping:              3
CPU MHz:               3501.000
BogoMIPS:              7007.99
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              6144K
NUMA node0 CPU(s):     0-3

And regarding the OpenCL runtime available:

[lvs@eredmithrim CapsBasic]$ ./CapsBasic 
Number of available platforms: 1
Platform names:
    [0] Intel(R) OpenCL [Selected]
Number of devices available for each type:
    CL_DEVICE_TYPE_CPU: 1
    CL_DEVICE_TYPE_GPU: 0
    CL_DEVICE_TYPE_ACCELERATOR: 0

*** Detailed information for each device ***

CL_DEVICE_TYPE_CPU[0]
    CL_DEVICE_NAME: Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz
    CL_DEVICE_AVAILABLE: 1
    CL_DEVICE_VENDOR: Intel(R) Corporation
    CL_DEVICE_PROFILE: FULL_PROFILE
    CL_DEVICE_VERSION: OpenCL 1.2 (Build 57)
    CL_DRIVER_VERSION: 1.2.0.57
    CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.2 
    CL_DEVICE_MAX_COMPUTE_UNITS: 4
    CL_DEVICE_MAX_CLOCK_FREQUENCY: 3500
    CL_DEVICE_MAX_WORK_GROUP_SIZE: 8192
    CL_DEVICE_ADDRESS_BITS: 64
    CL_DEVICE_MEM_BASE_ADDR_ALIGN: 1024
    CL_DEVICE_MAX_MEM_ALLOC_SIZE: 4125402112
    CL_DEVICE_GLOBAL_MEM_SIZE: 16501608448
    CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 131072
    CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 262144
    CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 64
    CL_DEVICE_LOCAL_MEM_SIZE: 32768
    CL_DEVICE_PROFILING_TIMER_RESOLUTION: 1
    CL_DEVICE_IMAGE_SUPPORT: 1
    CL_DEVICE_ERROR_CORRECTION_SUPPORT: 0
    CL_DEVICE_HOST_UNIFIED_MEMORY: 1
    CL_DEVICE_EXTENSIONS: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 1
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 1
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1
    CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 1
    CL_DEVICE_NATIVE_VECTOR_WIDTH_INT: 8
    CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG: 4
    CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT: 8
    CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE: 4
[lvs@eredmithrim CapsBasic]$ 

 

My application has four kernels and each of them has several workgroups.

I would like to know how many compute units this program is actually using (The only info I can see above is the max number of them, but I think  CL_DEVICE_MAX_COMPUTE_UNITS is just a reference and the actual number of compute units used may be different).

I wonder if there is a way to control the number of compute units or if this is a runtime-based decision. Any comments on this?

Any info or pointers are appreciated.

Leonardo

 

0 Kudos
2 Replies
Jeffrey_M_Intel1
Employee
578 Views

The CPU implementation is written on top of Threading Building Blocks (TBB).  By default it will use the number of physical cores in your machine -- 4 in your case.  

You can control this behavior to use fewer cores with device fission.

https://software.intel.com/en-us/articles/opencl-device-fission-for-cpu-performance

 

Tamer_Assad
Innovator
578 Views

Hi Leonardo,

 

You can query your CL device using " clGetDeviceInfo()"

multiple calls to " clGetDeviceInfo()" passing different values for the "cl_device_info" parameter, can provide you all the info you need, lookup:

CL_DEVICE_MAX_WORK_GROUP_SIZE

CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS

CL_DEVICE_MAX_WORK_ITEM_SIZES

 

For a specific kernel you are setting up, you can use "clGetKernelWorkGroupInfo()", depending on your query, the following are valid values for the "cl_kernel_work_group_info" parameter:

CL_KERNEL_WORK_GROUP_SIZE

CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE

 

You can control and decide the target workgroup size, within the boundaries of device capabilities as informed by previous queries, upon kernel execution "clEnqueueNDRangeKernel()".

 

Best regards,

Tamer Assad

Reply