OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1663 Discussions

Concurrent memory accesses between Intel CPU cores and HD graphics cores

Hi, I have tried to use both Intel CPU cores and HD graphics cores simultaneously under Intel OpenCL SDK. The first thing I tried is a simple memory copy kernel to see whether the transfer from global to private memory (and vice versa) occurs simultaneously for both Intel CPU cores and HD graphics cores. Here are parts of my source codes. #define FLOAT float __kernel void assign(__global FLOAT *x, __global FLOAT *y) { size_t idx = get_global_id(0); y[idx] = x[idx]; }
... -------------- Time measurement START -------------- // Enqueue NDRange to CPU clEnqueueNDRangeKernel(cqCommandQueue_cpu, ckKernel[1], 1, NULL, &GWS2, &LWS2, 0, NULL, &ev_list[0]); clFlush(cqCommandQueue_cpu); // Enqueue NDRange to GPU clEnqueueNDRangeKernel(cqCommandQueue_gpu, ckKernel[0], 1, NULL, &GWS, &LWS, 0, NULL, &ev_list[1]); clFlush(cqCommandQueue_gpu); err = clWaitForEvents(2, ev_list); CheckErr(err); -------------- Time measurement STOP -------------- ... The above program assigns the vector x into the vector y in parallel. Assume we have n length vector x and y. Since assigning each vector element is independent from each other, I thought that a simple load balancing is possible between Intel CPU cores and HD graphics cores. Unfortunately, above two kernels run almost serial compared with the time that I measured the transfer time of the CPU cores and HD graphics cores separately. 'k' numbers of assignments to CPU takes T_c seconds, and 'n-k' numbers of assignments to HD graphics takes T_g seconds. I expected max(T_c,T_g) time for a result of above program so that guarantees concurrent assignments. But above program shows about T_c+T_g seconds which means above kernel execution is almost serial. My conclusion is that the Intel CPU cores and HD graphics cores shares global memory bandwidth, so this could not be in parallel. But I am not sure it is impossible to be in parallel. Could anyone give me a comment whether it is impossible or not? Thanks in advance. * Test machine CPU : i7-3770K (HD graphics 4000) OS : Windows 7 SP1 64bit, VS2012 SDK : Intel OpenCL SDK 2013
0 Kudos
1 Reply


Yes, the CPU and the HD Graphics share memory BW. If they run simultaneously each of them can’t reach its maximum BW, as if it runs alone. Looking at this positively, each of them alone can use all or most of the BW. This makes sense if you expect the cores and GPU to do some computations on the data and not only memory transfer.

Regards, Anat