OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1667 Discussions

barrier(CLK_LOCAL_MEM_FENCE) weird deadlock in kernel's local array when reduction

lcljesse
Beginner
197 Views

The code run well by clCreateProgramWithSource, BUT dealock by clCreateProgramWithBinary. The size of work group is (8,8), I reproduce the deadlock on Intel Graphics win8.1+ HD4600/530 and win10+HD630.

local int SumAlpha[16*16/4];
int index = 1*get_local_id(0)+ 16/2*get_local_id(1);
......
SumAlpha[index] = sum_alpha;

barrier(CLK_LOCAL_MEM_FENCE);
// do reduction in shared mem
for(int i = (16*16/8); i > 0; i >>= 1) 
{
    if(index < i) 
    {
        SumAlpha[index] += SumAlpha[index + i]; 
    }
    barrier(CLK_LOCAL_MEM_FENCE); // unroll loop show deadlock when i==8
}

the cmd which create spir is 

"..\vendors\OpenCL\Intel\bin\ioc32.exe -cmd=build -input="xxx" -device=gpu -spir32="FileName%" -bo="-cl-std=CL1.2 -I %ProjetcDir%\cl"

And I tried '-cl-opt-disable', same deadlock. I tried below equivalence, also deadlock.

a = SumAlpha[index];
b = SumAlpha[index + i];
SumAlpha[index] = a+ b;

I can use instruction dot to avoid reduction, while I want to know more. Did I miss something? Or is there any way to dump out the gen assembly of both way to compare their difference? Greatly Thanks.

0 Kudos
0 Replies
Reply