- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I met issue when using work_group_all/sub_group_all. So I simplified into below kernel.
// kernel start
kernel void entry()
{
int id = (int)get_global_id(0);
bool end = false;
int cnt = 0;
bool end2 = false; // always of the same value for the whole work/sub group
while (1)
{
if (end2) break;
if (cnt==0)
{
// First loop
end = id==0; // end is only true for first work item
}
else
{
// Second loop
end = true; // end is always true now
}
// end2 will be false at first loop, and true at second loop
// end2 will be of the same value for whole sub_group/work_group
#if 1
end2 = sub_group_all(end?1:0)!=0;
#else
end2 = work_group_all(end?1:0)!=0;
#endif
#if 1
if ((id&0xff)<=1)
{
printf("id = %d, cnt=%d, end = %d, end2 = %d\n", id, cnt, end?1:0, end2?1:0);
}
#endif
cnt++;
}
}
// kernel end
The execution is just dead. Output shows the cnt will never be 2, but the kernel just not finished. No matter I use work_group_all() or sub_group_all().
id = 0, cnt=0, end = 1, end2 = 0
id = 1, cnt=0, end = 0, end2 = 0
id = 512, cnt=0, end = 0, end2 = 0
id = 513, cnt=0, end = 0, end2 = 0
id = 512, cnt=1, end = 1, end2 = 1
id = 513, cnt=1, end = 1, end2 = 1
id = 512, cnt=1, end = 1, end2 = 1
id = 513, cnt=1, end = 1, end2 = 1
id = 512, cnt=1, end = 1, end2 = 1
id = 513, cnt=1, end = 1, end2 = 1
My work item number is always power of 2, and bigger than 512.
When running on CPU, it just deadloop that I can kill through OS. When running on GPU, it will just lead to whole OS deadloop if I use work_group_all().
I tried with following two different OCL compiler version with same result:
Intel(R) SDK for OpenCL(TM) - Offline Compiler, version 8.0.0.171
Intel(R) SDK for OpenCL(TM) - offline compiler command line, version 7.0.0.3993
Thanks,
Tango
Link Copied
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page