handling "SISD kernels"

lolxdfly · ‎05-12-2020

What is the difference in the drivers point of view of these two kernels?

kernel void clmain(global const unsigned int* in, global unsigned int* out)
{
    unsigned int id = get_global_id(0);

    if(id == 0)
    {
        for(int i = 0; i < 32; i++)
        {
            out = in;
        }
    }
}

kernel void clmain(global const unsigned int* in, global unsigned int* out)
{
    unsigned int id = get_global_id(0);
    out[id] = in[id];
}

I know that the first should be sequential and the second parallel. What does this mean for the driver?

The first kernel takes the double amount of space for instructions, because the loop seems to be unrolled. Is there anything the driver has to do, to keep the program simd-compliant? The instructions should be SISD only for the first one, since every instruction is only allowed to affect one data point. Does the driver has to mask the SIMD units somehow? I only know the masking for the last workgroup to fit the global work size. Does the first kernel even run on other execution units? The complete work is packed into the first work-item.

Background is an own bare metal driver for IGPs, where the first kernel results in a gpu hang but the second kernel runs fine.