OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.

Which fine grain SVM features are supported in the current Gen8 driver?

allanmac1
Beginner
664 Views

I don't have Broadwell hardware in front of me yet so can you tell me which fine-grain SVM capabilities are supported in the latest driver on Gen8 devices?  Just FINE_GRAIN_BUFFER?

If FINE_GRAIN_SYSTEM is supported then can an 8-16GB host address space be shared?

The OpenCL 2.0 SVM article does a nice job summarizing the capability bits.  Can you list which are supported in the .4080 driver and which might eventually be supported?

  • CL_DEVICE_SVM_COARSE_GRAIN for coarse-grained buffer SVM
  • CL_DEVICE_SVM_FINE_GRAIN_BUFFER for fine-grained buffer SVM
  • CL_DEVICE_SVM_FINE_GRAIN_SYSTEM for fine-grained system SVM
  • CL_DEVICE_SVM_ATOMICS for atomics support

svm.png

0 Kudos
6 Replies
Robert_I_Intel
Employee
664 Views

Hi Allan,

It depends on the Broadwell hardware you will have:

if it has HD5300 graphics, then only coarse grain buffer SVM is supported. 

if it has HD5500 graphics and above, then both coarse grain buffer SVM and fine grain buffer SVM with or without atomics are supported. 

Cannot comment on fine grain system SVM capabilities, since I haven't seen any production systems that support it yet.

0 Kudos
allanmac1
Beginner
664 Views

Thanks, that's helpful.

The followup Broadwell question is what are the latest memory allocation limits (max and total) for clCreateBuffer() and clSVMAlloc()?  

I'm planning out an IGP demo that could potentially benefit from a large heap.

 

0 Kudos
Robert_I_Intel
Employee
664 Views

Allan,

I managed to get a hold of the guy with Broadwell system and it is reporting the following, which looks like the Global memory size is 1.6 GB, but individual buffer allocations are ~400 MB max (I believe this is on a system w/ 4 GB of RAM):

Using platform: Intel(R) Corporation and device: Intel(R) HD Graphics 5500.

OpenCL Device info:

CL_DEVICE_VENDOR                                      :Intel(R) Corporation

CL_DEVICE_NAME                                                           :Intel(R) HD Graphics 5500

CL_DRIVER_VERSION                                     :10.18.15.9999

CL_DEVICE_PROFILE                                       :FULL_PROFILE

CL_DEVICE_VERSION                                      :OpenCL 2.0

CL_DEVICE_OPENCL_C_VERSION                              :OpenCL C 2.0 ( using USC )

CL_DEVICE_MAX_COMPUTE_UNITS                         :      24

CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS      :       3

CL_DEVICE_MAX_WORK_ITEM_SIZES                      :    (  256,   256,   256)

CL_DEVICE_MAX_WORK_GROUP_SIZE                   :     256

CL_DEVICE_MEM_BASE_ADDR_ALIGN                    :    1024

CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE            :     128

CL_DEVICE_MAX_CLOCK_FREQUENCY                    :     900

CL_DEVICE_IMAGE2D_MAX_WIDTH                         :   16384

CL_DEVICE_GLOBAL_MEM_SIZE                 :  1669927680.0

CL_DEVICE_LOCAL_MEM_SIZE                    :       65536.0

CL_DEVICE_MAX_MEM_ALLOC_SIZE                        :   417481920.0

0 Kudos
allanmac1
Beginner
664 Views

OK, thanks for providing such detailed answers.  I'll try it out on my HD 6000 NUC when it arrives.

I'm surprised to see that the max workgroup size has been reduced to 256 from Haswell's 512. 

Is this a permanent limit going forward?  

Haswell was 512:

hd4600.png

0 Kudos
Robert_I_Intel
Employee
664 Views

Allan,

Here is the answer from our compute architect:

You can compute the workgroup size from the # of threads in a subslice. BDW dropped the number of EUs per SS to 8 from 10.

 

HSW: 10 EUs/subslice * 7 threads * simd8 = 560. Rounded down to 512.

BDW: 8 EUs/subslice * 7 threads * simd8 = 448. Rounded down to 256.

 

Not sure about the permanency of this change though.

0 Kudos
allanmac1
Beginner
664 Views

I asked because for some kernels launching a workgroup of 448 items (vs. 2 x 224) can avoid an extra round of off-slice communication while taking advantage of higher performance intra-subslice / intra-workgroup communication through SLM.  Not a huge deal.

OK, thanks for answering my long stream of questions.  Pumped to get the HD 6000 in my paws on Monday. :)

0 Kudos
Reply