Tip for people porting from CUDA: __syncthreads() != barrier(CLK_LOCAL_MEM_FENCE)

allanmac1 · ‎07-11-2016

( The following is based on some recent experiments on a GEN8 IGP )

FYI -- one gotcha to watch out for when porting from CUDA to Intel IGP is that the OpenCL barrier()/work_group_barrier() operation doesn't support either work items or subgroups exiting early.

For example, if a subgroup returns early and the remaining work items synchronize in a barrier() then your kernel is going to hang on the IGP.

Early exit of some threads (work items) at the end of a grid is a pretty common use case in CUDA.

Fortunately, OpenCL 2.0 has a feature that doesn't exist in CUDA and it might help you workaround this issue... Non-Uniform Work Groups.

Jeffrey_M_Intel1 · ‎07-13-2016

Thanks for this report. We will see how to get updates on this topic into the documentation.