OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.

How to perform Synchronization on Integrated GPU?

zhang_f_
Beginner
1,416 Views

Hi,

I use OpenCL on Intel® Core™ i7-4980HQ Processor.

The problem is:

When the CPU and GPU write to the same cacheline, results sometimes get wrong.

The CPU and GPU both use OpenCL kernels.

Is there any mechanism to solve it? Atomic operation or something?

Hope for your reply,

Thanks!

0 Kudos
1 Solution
Robert_I_Intel
Employee
1,416 Views

Hi Zhang,

1. Atomics in OpenCL 1.2 are for syncing reads/writes to global memory from different work-items within the kernel, not for syncing between host and device reads/writes. For that you will need 5th Generation Intel(R) Processors (Broadwells) and OpenCL 2.0.

Cache coherency in OpenCL 1.2 is only at synchronization points, which typically means at the end of kernel execution.

2. GPU currently does not support ch_khr_fp64 (there is hardware, but support is not enabled in software). This extension is supported only on the CPU device right now.

View solution in original post

0 Kudos
3 Replies
Robert_I_Intel
Employee
1,416 Views

Hi Zhang,

Sure, there are atomics. Check this link out: https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/atomicFunctions.html

0 Kudos
zhang_f_
Beginner
1,416 Views

Hello, Robert Loffe.

Thanks for your reply, but I do not think there is a consistency between the CPU and GPU.

Question 1:

For example, I have a buffer int a[1024] and all elements are 0. Then I use the CPU kernel to increase each  odd elements such as a[1], a[3] and a[5]. I use the GPU kernel to decrease the even elements such as a[0], a[2] and a[4]. This process is in a for loop for 100 times. Because they are in the same cache line I cannot get correct results. But if they process different elements in different cache line, the result is right.

I use the atomic_add and atomic_sub but it still get wrong. Does Intel does not implement Cache Coherency between CPU and GPU? Why? The only way I can handle this is to use clFinish() between the CPU kernel and GPU kernel. So it is serial.

 

Question 2:

Does the GPU supports the double type? When I use double, clBuildProgram get wrong.

I have add "#pragma OPENCL EXTENSION cl_khr_fp4 : enable" in the cl file.

 

Thanks! Hope for your reply.

0 Kudos
Robert_I_Intel
Employee
1,417 Views

Hi Zhang,

1. Atomics in OpenCL 1.2 are for syncing reads/writes to global memory from different work-items within the kernel, not for syncing between host and device reads/writes. For that you will need 5th Generation Intel(R) Processors (Broadwells) and OpenCL 2.0.

Cache coherency in OpenCL 1.2 is only at synchronization points, which typically means at the end of kernel execution.

2. GPU currently does not support ch_khr_fp64 (there is hardware, but support is not enabled in software). This extension is supported only on the CPU device right now.

0 Kudos
Reply