Solved: Hello, Robert Loffe.

zhang_f_ · ‎05-16-2015

Hi,

I use OpenCL on Intel® Core™ i7-4980HQ Processor.

The problem is:

When the CPU and GPU write to the same cacheline, results sometimes get wrong.

The CPU and GPU both use OpenCL kernels.

Is there any mechanism to solve it? Atomic operation or something?

Hope for your reply,

Thanks!

Robert_I_Intel · ‎06-09-2015

Hi Zhang,

1. Atomics in OpenCL 1.2 are for syncing reads/writes to global memory from different work-items within the kernel, not for syncing between host and device reads/writes. For that you will need 5th Generation Intel(R) Processors (Broadwells) and OpenCL 2.0.

Cache coherency in OpenCL 1.2 is only at synchronization points, which typically means at the end of kernel execution.

2. GPU currently does not support ch_khr_fp64 (there is hardware, but support is not enabled in software). This extension is supported only on the CPU device right now.

View solution in original post

Robert_I_Intel · ‎05-18-2015

Hi Zhang,

Sure, there are atomics. Check this link out: https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/atomicFunctions.html

zhang_f_ · ‎06-09-2015

Hello, Robert Loffe.

Thanks for your reply, but I do not think there is a consistency between the CPU and GPU.

Question 1:

For example, I have a buffer int a[1024] and all elements are 0. Then I use the CPU kernel to increase each odd elements such as a[1], a[3] and a[5]. I use the GPU kernel to decrease the even elements such as a[0], a[2] and a[4]. This process is in a for loop for 100 times. Because they are in the same cache line I cannot get correct results. But if they process different elements in different cache line, the result is right.

I use the atomic_add and atomic_sub but it still get wrong. Does Intel does not implement Cache Coherency between CPU and GPU? Why? The only way I can handle this is to use clFinish() between the CPU kernel and GPU kernel. So it is serial.

Question 2:

Does the GPU supports the double type? When I use double, clBuildProgram get wrong.

I have add "#pragma OPENCL EXTENSION cl_khr_fp4 : enable" in the cl file.

Thanks! Hope for your reply.

Robert_I_Intel · ‎06-09-2015

Hi Zhang,

1. Atomics in OpenCL 1.2 are for syncing reads/writes to global memory from different work-items within the kernel, not for syncing between host and device reads/writes. For that you will need 5th Generation Intel(R) Processors (Broadwells) and OpenCL 2.0.

Cache coherency in OpenCL 1.2 is only at synchronization points, which typically means at the end of kernel execution.

2. GPU currently does not support ch_khr_fp64 (there is hardware, but support is not enabled in software). This extension is supported only on the CPU device right now.

How to perform Synchronization on Integrated GPU?