- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am running a program using Intel OpenCL 1.2. My OpenCL device is a CPU:
[lvs@eredmithrim CapsBasic]$ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 94 Model name: Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz Stepping: 3 CPU MHz: 3501.000 BogoMIPS: 7007.99 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 6144K NUMA node0 CPU(s): 0-3
And regarding the OpenCL runtime available:
[lvs@eredmithrim CapsBasic]$ ./CapsBasic Number of available platforms: 1 Platform names: [0] Intel(R) OpenCL [Selected] Number of devices available for each type: CL_DEVICE_TYPE_CPU: 1 CL_DEVICE_TYPE_GPU: 0 CL_DEVICE_TYPE_ACCELERATOR: 0 *** Detailed information for each device *** CL_DEVICE_TYPE_CPU[0] CL_DEVICE_NAME: Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz CL_DEVICE_AVAILABLE: 1 CL_DEVICE_VENDOR: Intel(R) Corporation CL_DEVICE_PROFILE: FULL_PROFILE CL_DEVICE_VERSION: OpenCL 1.2 (Build 57) CL_DRIVER_VERSION: 1.2.0.57 CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.2 CL_DEVICE_MAX_COMPUTE_UNITS: 4 CL_DEVICE_MAX_CLOCK_FREQUENCY: 3500 CL_DEVICE_MAX_WORK_GROUP_SIZE: 8192 CL_DEVICE_ADDRESS_BITS: 64 CL_DEVICE_MEM_BASE_ADDR_ALIGN: 1024 CL_DEVICE_MAX_MEM_ALLOC_SIZE: 4125402112 CL_DEVICE_GLOBAL_MEM_SIZE: 16501608448 CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 131072 CL_DEVICE_GLOBAL_MEM_CACHE_SIZE: 262144 CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE: 64 CL_DEVICE_LOCAL_MEM_SIZE: 32768 CL_DEVICE_PROFILING_TIMER_RESOLUTION: 1 CL_DEVICE_IMAGE_SUPPORT: 1 CL_DEVICE_ERROR_CORRECTION_SUPPORT: 0 CL_DEVICE_HOST_UNIFIED_MEMORY: 1 CL_DEVICE_EXTENSIONS: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT: 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG: 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT: 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE: 1 CL_DEVICE_NATIVE_VECTOR_WIDTH_INT: 8 CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG: 4 CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT: 8 CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE: 4 [lvs@eredmithrim CapsBasic]$
My application has four kernels and each of them has several workgroups.
I would like to know how many compute units this program is actually using (The only info I can see above is the max number of them, but I think CL_DEVICE_MAX_COMPUTE_UNITS is just a reference and the actual number of compute units used may be different).
I wonder if there is a way to control the number of compute units or if this is a runtime-based decision. Any comments on this?
Any info or pointers are appreciated.
Leonardo
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The CPU implementation is written on top of Threading Building Blocks (TBB). By default it will use the number of physical cores in your machine -- 4 in your case.
You can control this behavior to use fewer cores with device fission.
https://software.intel.com/en-us/articles/opencl-device-fission-for-cpu-performance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Leonardo,
You can query your CL device using " clGetDeviceInfo()"
multiple calls to " clGetDeviceInfo()" passing different values for the "cl_device_info" parameter, can provide you all the info you need, lookup:
CL_DEVICE_MAX_WORK_GROUP_SIZE
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS
CL_DEVICE_MAX_WORK_ITEM_SIZES
For a specific kernel you are setting up, you can use "clGetKernelWorkGroupInfo()", depending on your query, the following are valid values for the "cl_kernel_work_group_info" parameter:
CL_KERNEL_WORK_GROUP_SIZE
CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE
You can control and decide the target workgroup size, within the boundaries of device capabilities as informed by previous queries, upon kernel execution "clEnqueueNDRangeKernel()".
Best regards,
Tamer Assad

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page