- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi!
Found and refined new strange cause for freezing OpenCL kernel.
This is absolutely minimal demonstration:
__kernel void freeze(
__global int *a,
__global int *b,
int parity )
{
__global int* c = (parity>0) ? a : b;
uint id = (uint) get_global_id(0);
while (id<256) {
c[id] = -1;
id += (uint) get_local_size(0);
}
}
(The ZIP archive with complete code of this minimal demonstration is attached to the message)
Conditions:
1. Intel HD Graphics 4600 GPU (while all others I checked -- Intel CPU, NVIDIA GPU, AMD GPU -- run fine)
2. work group size: starting from 32
3. kernel like this:
- ternary operator choosing from one of two global arrays of the same type
- while-loop to write into this array (size 256 can be anything else)
4. the condition of the operator is met (in this code, parity is greater than zero)
Then the program locks, several seconds later the computer freezes completely so you need to press Reset.
-----------------
I hope there are no dumb errors:
- both arrays exist,
- their size is large enough: is 65536*sizeof(cl_int)
- and you can explicitly use either of them in writing: a[id]=-1 and b[id]=-1 work OK. so, only access to c[] in looped manner cause the freezing.
- the initial (rather complex) program works well on many other devices.
P.S.
if printed, the pointer 'c' looks OK:
if (id==0) printf(" a=%p b=%p c=%p\n", a, b, c);
it prints, for example:
a=0x100000000 b=0x200000000 c=0x100000000
or
a=0x100000000 b=0x200000000 c=0x200000000
The code with printf() does not freeze. But I don't need any printf() in the production code :)
Looks like bu... well... imperfection :)
And I don't see any restriction on ternary operator in documentation.
Best Regard,
Petr
링크가 복사됨
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi!
some news and statistics on more GPU devices...
HD Graphics 4600, Windows 7 x64 (latest driver win64_15.36.34.4889.exe)
- the problem is here, starting from workgroup size 32.
HD Graphics 4000, Windows 7 x64 (different computer, latest driver win64_15.33.46.4885.exe)
- no problem with any group size 1..256
Iris Pro 620, Ubuntu 17.04 (driver: beignet)
- the problem is here for any workgroup size: 1,8,16,32,64,256
Luckily the computer does not freeze: the program can be stopped with Ctrl+C
and in dmesg we have the following lines
[ 42.006853] [drm] GPU HANG: ecode 8:0:0x85ddfffb, reason: Hang on rcs0, action: reset
[ 42.006907] drm/i915: Resetting chip after gpu hang
-----
Updated ZIP archive in attachment: config file for easier choosing platform/device/workgroup-size
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi Petr,
Due to hardware limitations the compiler for the Intel HD Graphics 4600 GPU had to emit special code to handle the case where the buffer being read from or written to cannot be statically determined, and particularly when different work items in a work group may be accessing different buffers - we call these "divergent pointer" cases. It's possible you've encountered a bug in this case.
Is it possible to restructure your code slightly to eliminate the divergent pointer? For example, something like:
__kernel void freeze(
__global int *a,
__global int *b,
int parity )
{
uint id = (uint) get_global_id(0);
while (id<256) {
if(parity>0)
a[id] = -1;
else
b[id] = -1;
id += (uint) get_local_size(0);
}
}
As an added bonus, this code should execute faster, too.
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi Ben,
thanks for your answer.
Yes, I modified my code to get rid of this type of operation, by joining these two arrays into one:
__global int *c = a + arraySize*parity;
It works OK. (though maybe it also can look divergent for compiler?..)
if-else version works OK. (I did not use it because it's simply does not look good in rather complex code with many accesses to these arrays, but who knows... maybe will change later)
Thanks for explanation!