OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1707 Discussions

Tip for people porting from CUDA: __syncthreads() != barrier(CLK_LOCAL_MEM_FENCE)


( The following is based on some recent experiments on a GEN8 IGP )

FYI -- one gotcha to watch out for when porting from CUDA to Intel IGP is that the OpenCL barrier()/work_group_barrier() operation doesn't support either work items or subgroups exiting early.

For example, if a subgroup returns early and the remaining work items synchronize in a barrier() then your kernel is going to hang on the IGP.

Early exit of some threads (work items) at the end of a grid is a pretty common use case in CUDA.

Fortunately, OpenCL 2.0 has a feature that doesn't exist in CUDA and it might help you workaround this issue... Non-Uniform Work Groups.


0 Kudos
1 Reply

Thanks for this report.  We will see how to get updates on this topic into the documentation.

0 Kudos