OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.

handling "SISD kernels"

lolxdfly
Novice
623 Views

What is the difference in the drivers point of view of these two kernels?

kernel void clmain(global const unsigned int* in, global unsigned int* out)
{
    unsigned int id = get_global_id(0);

    if(id == 0)
    {
        for(int i = 0; i < 32; i++)
        {
            out = in;
        }
    }
}
kernel void clmain(global const unsigned int* in, global unsigned int* out)
{
    unsigned int id = get_global_id(0);
    out[id] = in[id];
}

I know that the first should be sequential and the second parallel. What does this mean for the driver?

The first kernel takes the double amount of space for instructions, because the loop seems to be unrolled. Is there anything the driver has to do, to keep the program simd-compliant? The instructions should be SISD only for the first one, since every instruction is only allowed to affect one data point. Does the driver has to mask the SIMD units somehow? I only know the masking for the last workgroup to fit the global work size. Does the first kernel even run on other execution units? The complete work is packed into the first work-item.

Background is an own bare metal driver for IGPs, where the first kernel results in a gpu hang but the second kernel runs fine.

0 Kudos
0 Replies
Reply