OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1663 Discussions

How should I view global and local work sizes

Brigden__Marion
Beginner
549 Views

I've been using OpenCL for a little while now for hobby purposes. I was wondering if someone could explain how i should view global and local work spaces. I've been playing around with it for a bit but i cannot seem to wrap my head around it.

I have this piece of code, the kernel has a global work size of 8 and the local work size of 4

    __kernel void foo(__global int *bar)
    {    
        bar[get_global_id(0)] = get_local_id(0);
    }

This result in bar looks like this:

{0, 1, 2, 3, 0, 1, 2, 3, 4}

I know why it is happening because of the work sizes I've used. But i can't seem to wrap my head around how i should view this.

Does this mean that there are 4 threads working locally and 8 globally so i have 4 * 8 threads running in total? and if so what makes those 4 working locally special?

Or does this mean the main body of the kernel just has two counters? one from local and one global but what is the point of that?

I know i might be a bit vague and my question might seem dumb. But i don't know how i can use this more optimally and how i should view this?

0 Kudos
1 Reply
Ben_A_Intel
Employee
549 Views

Hi Marion!

These slides are from 2011 but I think they do a good job communicating these concepts.  I'd suggest starting with slides 2 and 6 and then going from there:

https://www.khronos.org/assets/uploads/developers/library/2011-siggraph-opencl-bof/OpenCL-BOF-OpenCL...

In short, the "global work size" describes the ND-Range iteration space, and the "local work size" describes how the work-items in the ND-Range iteration space are grouped together.  Grouping work items provides additional execution model guarantees that enable work-items in the same work-group to synchronize execution via barriers and to communicate via shared local memory.  The grouping isn't important for "embarassingly parallel" kernels, but it is important for some algorithms that require cooperation among work-items.

These aren't dumb questions at all, so feel free to follow-up if something is still unclear.  Thanks!

Reply