OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1722 Discussions

How global and local work sizes are distributed in this function?

Naing__Nyan
Beginner
1,021 Views

This is from a sample program for OpenCL programming. I am confused about how global and local work size are computed. They are computed based on the image size.

Image size is 1920 x 1080 (w x h).

What I assumed is global_work_size[0] and global_work_size[1] are grids on image.

But now global_work_size is {128, 1088}.

Then local_work_size[0] and local_work_size[1] are grids on global_work_size. local_work_size is {128, 32}.

But total groups, num_groups = 34, but it is not 128 x 1088.

Max workgroup_size available at device is 4096.

How is the image distributed into such global and local work group sizes?

They are calculated in the following function.

 

    clGetKernelWorkGroupInfo(histogram_rgba_unorm8, device, CL_KERNEL_WORK_GROUP_SIZE, sizeof(size_t), &workgroup_size, NULL);
    {
        size_t  gsize[2];
        int     w;

        if (workgroup_size <= 256)
        {
            gsize[0] = 16;//workgroup_size is formed into row & col
            gsize[1] = workgroup_size / 16;
        }
        else if (workgroup_size <= 1024)
        {
            gsize[0] = workgroup_size / 16;
            gsize[1] = 16;
        }
        else
        {
            gsize[0] = workgroup_size / 32;
            gsize[1] = 32;
        }

        local_work_size[0] = gsize[0];
        local_work_size[1] = gsize[1];

        w = (image_width + num_pixels_per_work_item - 1) / num_pixels_per_work_item;//to include all pixels, num_pixels_per_work_item is added first
        global_work_size[0] = ((w + gsize[0] - 1) / gsize[0]);//col
        global_work_size[1] = ((image_height + gsize[1] - 1) / gsize[1]);//row

        num_groups = global_work_size[0] * global_work_size[1];    
        global_work_size[0] *= gsize[0];
        global_work_size[1] *= gsize[1];
    }    
    err = clEnqueueNDRangeKernel(queue, histogram_rgba_unorm8, 2, NULL, global_work_size, local_work_size, 0, NULL, NULL);
    if (err)
    {
        printf("clEnqueueNDRangeKernel() failed for histogram_rgba_unorm8 kernel. (%d)\n", err);
        return EXIT_FAILURE;
    } 

 

0 Kudos
0 Replies
Reply