- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I met issue when using work_group_all/sub_group_all. So I simplified into below kernel.
// kernel start
kernel void entry() { int id = (int)get_global_id(0); bool end = false; int cnt = 0; bool end2 = false; // always of the same value for the whole work/sub group while (1) { if (end2) break; if (cnt==0) { // First loop end = id==0; // end is only true for first work item } else { // Second loop end = true; // end is always true now } // end2 will be false at first loop, and true at second loop // end2 will be of the same value for whole sub_group/work_group #if 1 end2 = sub_group_all(end?1:0)!=0; #else end2 = work_group_all(end?1:0)!=0; #endif #if 1 if ((id&0xff)<=1) { printf("id = %d, cnt=%d, end = %d, end2 = %d\n", id, cnt, end?1:0, end2?1:0); } #endif cnt++; } }
// kernel end
The execution is just dead. Output shows the cnt will never be 2, but the kernel just not finished. No matter I use work_group_all() or sub_group_all().
id = 0, cnt=0, end = 1, end2 = 0
id = 1, cnt=0, end = 0, end2 = 0
id = 512, cnt=0, end = 0, end2 = 0
id = 513, cnt=0, end = 0, end2 = 0
id = 512, cnt=1, end = 1, end2 = 1
id = 513, cnt=1, end = 1, end2 = 1
id = 512, cnt=1, end = 1, end2 = 1
id = 513, cnt=1, end = 1, end2 = 1
id = 512, cnt=1, end = 1, end2 = 1
id = 513, cnt=1, end = 1, end2 = 1
My work item number is always power of 2, and bigger than 512.
When running on CPU, it just deadloop that I can kill through OS. When running on GPU, it will just lead to whole OS deadloop if I use work_group_all().
I tried with following two different OCL compiler version with same result:
Intel(R) SDK for OpenCL(TM) - Offline Compiler, version 8.0.0.171
Intel(R) SDK for OpenCL(TM) - offline compiler command line, version 7.0.0.3993
Thanks,
Tango
Link Copied

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page