OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1718 Discussions

Potential OpenCL compiler/implementation issue

gu__sheng
Beginner
452 Views

Hi,

I met issue when using work_group_all/sub_group_all. So I simplified into below kernel.

// kernel start

kernel void entry()
{
    int id = (int)get_global_id(0);

    bool end = false;
    int cnt = 0;
    bool end2 = false;  // always of the same value for the whole work/sub group
    while (1)
    {
        if (end2) break;

        if (cnt==0)
        {
            //  First loop
            end = id==0;    //  end is only true for first work item
        }
        else
        {
            //  Second loop
            end = true;     //  end is always true now
        }

        //  end2 will be false at first loop, and true at second loop
        //  end2 will be of the same value for whole sub_group/work_group
#if 1
        end2 = sub_group_all(end?1:0)!=0;
#else
        end2 = work_group_all(end?1:0)!=0;
#endif

#if 1
        if ((id&0xff)<=1)
        {
            printf("id = %d, cnt=%d, end = %d, end2 = %d\n", id, cnt, end?1:0, end2?1:0);
        }
#endif

        cnt++;
    }
}

// kernel end

The execution is just dead. Output shows the cnt will never be 2, but the kernel just not finished. No matter I use work_group_all() or sub_group_all().

id = 0, cnt=0, end = 1, end2 = 0
id = 1, cnt=0, end = 0, end2 = 0
id = 512, cnt=0, end = 0, end2 = 0
id = 513, cnt=0, end = 0, end2 = 0
id = 512, cnt=1, end = 1, end2 = 1
id = 513, cnt=1, end = 1, end2 = 1
id = 512, cnt=1, end = 1, end2 = 1
id = 513, cnt=1, end = 1, end2 = 1
id = 512, cnt=1, end = 1, end2 = 1
id = 513, cnt=1, end = 1, end2 = 1

My work item number is always power of 2, and bigger than 512.

When running on CPU, it just deadloop that I can kill through OS. When running on GPU, it will just lead to whole OS deadloop if I use work_group_all().

I tried with following two different OCL compiler version with same result:

Intel(R) SDK for OpenCL(TM) - Offline Compiler, version 8.0.0.171

Intel(R) SDK for OpenCL(TM) - offline compiler command line, version 7.0.0.3993

 

Thanks,
Tango

 

 

 

 

0 Kudos
0 Replies
Reply