OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.

Intel HD Graphics 4600 GPU + ternary operator on pointers + while-loop = freeze

Petr_F__Kartsev
Beginner
701 Views

Hi!

Found and refined new strange cause for freezing OpenCL kernel.

This is absolutely minimal demonstration:

__kernel void freeze(
    __global int *a,
    __global int *b,
    int parity )
{
    __global int* c = (parity>0) ? a : b;
    uint id = (uint) get_global_id(0);
    while (id<256) {
        c[id] = -1;
        id += (uint) get_local_size(0);
    }
}

(The ZIP archive with complete code of this minimal demonstration is attached to the message)

Conditions:

1. Intel HD Graphics 4600 GPU (while all others I checked -- Intel CPU, NVIDIA GPU, AMD GPU -- run fine)

2. work group size: starting from 32

3. kernel like this:

  •   ternary operator choosing from one of two global arrays of the same type
  •   while-loop to write into this array (size 256 can be anything else)

4. the condition of the operator is met (in this code, parity is greater than zero)

 

Then the program locks, several seconds later the computer freezes completely so you need to press Reset.

-----------------

I hope there are no dumb errors:

  •  both arrays exist,
  •  their size is large enough: is 65536*sizeof(cl_int)
  •  and you can explicitly use either of them in writing: a[id]=-1 and b[id]=-1 work OK. so, only access to c[] in looped manner cause the freezing.
  • the initial (rather complex) program works well on many other devices.

P.S.
if printed, the pointer 'c' looks OK:

  if (id==0) printf(" a=%p b=%p c=%p\n", a, b, c);

it prints, for example:
 

     a=0x100000000 b=0x200000000 c=0x100000000
    or
     a=0x100000000 b=0x200000000 c=0x200000000

The code with printf() does not freeze. But I don't need any printf() in the production code :)

Looks like bu... well... imperfection :)
And I don't see any restriction on ternary operator in documentation.

 

Best Regard,

Petr

0 Kudos
3 Replies
Petr_F__Kartsev
Beginner
701 Views

Hi!

some news and statistics on more GPU devices...

HD Graphics 4600, Windows 7 x64 (latest driver win64_15.36.34.4889.exe)
 - the problem is here, starting from workgroup size 32.


HD Graphics 4000, Windows 7 x64 (different computer, latest driver win64_15.33.46.4885.exe)
 - no problem with any group size 1..256


Iris Pro 620, Ubuntu 17.04 (driver: beignet)
 - the problem is here for any workgroup size: 1,8,16,32,64,256
Luckily the computer does not freeze: the program can be stopped with Ctrl+C
and in dmesg we have the following lines
[   42.006853] [drm] GPU HANG: ecode 8:0:0x85ddfffb, reason: Hang on rcs0, action: reset
[   42.006907] drm/i915: Resetting chip after gpu hang

 

-----
Updated ZIP archive in attachment: config file for easier choosing platform/device/workgroup-size

 

0 Kudos
Ben_A_Intel
Employee
701 Views

Hi Petr,

Due to hardware limitations the compiler for the Intel HD Graphics 4600 GPU had to emit special code to handle the case where the buffer being read from or written to cannot be statically determined, and particularly when different work items in a work group may be accessing different buffers - we call these "divergent pointer" cases.  It's possible you've encountered a bug in this case.

Is it possible to restructure your code slightly to eliminate the divergent pointer?  For example, something like:

__kernel void freeze(
    __global int *a,
    __global int *b,
    int parity )
{
    uint id = (uint) get_global_id(0);
    while (id<256) {
        if(parity>0)
            a[id] = -1;
        else
            b[id] = -1;
        id += (uint) get_local_size(0);
    }
}

As an added bonus, this code should execute faster, too.

0 Kudos
Petr_F__Kartsev
Beginner
701 Views

Hi Ben,
thanks for your answer.

Yes, I modified my code to get rid of this type of operation, by joining these two arrays into one:

__global int *c    = a + arraySize*parity;

It works OK. (though maybe it also can look divergent for compiler?..)

if-else version works OK. (I did not use it because it's simply does not look good in rather complex code with many accesses to these arrays, but who knows... maybe will change later)

Thanks for explanation!

0 Kudos
Reply