Migrating to SYCL
One-stop forum for getting assistance migrating your existing code to SYCL

Question about max group size

ManuelCostanzo2
2,869 Views

Hi everyone! I have a question about migrating this CUDA kernel:

 

kernel<<< blocks, threads >>>(...);

 

DPCT migrates this kernel as:

 

cgh.parallel_for(
sycl::nd_range<3>(sycl::range<3>(1, 1, blocks) *
sycl::range<3>(1, 1, threads),
sycl::range<3>(1, 1, threads)),
[=](sycl::nd_item<3> item_ct1) {
   kernel(...);
});
 
But DPCT alerts:

DPCT1049:55: The workgroup size passed to the SYCL kernel may
exceed the limit. To get the device limit, query
info::device::max_work_group_size. Adjust the workgroup size if
needed.
 
 
So, how can I ask for the maximum blocks and threads ? Thank you so much!
0 Kudos
11 Replies
ShwethaS_Intel
Moderator
2,809 Views

Hi,

 

Thanks for reaching out to us.

 

>> So, how can I ask for the maximum blocks and threads ?

To check supported max work group size or any other info related to your device, you may run 'clinfo' command in your terminal.

 

For more info on the DPCT alert, kindly refer to this link: https://www.intel.com/content/www/us/en/develop/documentation/intel-dpcpp-compatibility-tool-user-guide/top/diagnostics-reference/dpct1049.html#dpct1049_id-dpct1049

 

Regards,

Shwetha

 

 

 

0 Kudos
ManuelCostanzo2
2,806 Views

Hey! Thank you. Using that I know the maximum group size. How can I get the maximum number of threads per group size? I didn't find that

0 Kudos
ShwethaS_Intel
Moderator
2,768 Views

Hi,


"clinfo" command gives all the necessary info related to your device through a command line.

 

Max work group size = max no. of threads allowed in CUDA's block

Max work item size = max no. of threads allowed in CUDA's grid

(Max work item size / Max work group size) = max no. of blocks allowed in CUDA's grid at any given instance

 


The same info can be obtained from the DPC++ API using :

  1. "queue.get_info<device::detail::max_work_group_size>()" - The maximum number of work-items that are permitted in a work-group executing a kernel on a single compute unit.
  2. "queue.get_info<device::detail::max_work_item_sizes>()" - The maximum number of work-items that are permitted in each dimension of the work-group of the nd_range.

 

Thanks & Regards,

Shwetha


0 Kudos
ShwethaS_Intel
Moderator
2,720 Views

Hi,


A gentle remainder to respond.


Regards,

Shwetha


0 Kudos
ManuelCostanzo2
2,700 Views

Hey! Thank you so much for your help.

 

with max_work_group_size I'm getting 1024.

with max_work_item_sizes I'm getting (64, 1024, 1024)

 

So, my kernel looks like this 

cgh.parallel_for(
sycl::nd_range<3>(sycl::range<3>(1, 1, blocks) *
sycl::range<3>(1, 1, threads),
sycl::range<3>(1, 1, threads)),
[=](sycl::nd_item<3> item_ct1) {
   kernel(...);
});
 
if I want to send the maximum blocks, I cant divide 1024/1024, what would be the idea? Sorry
0 Kudos
ManuelCostanzo2
2,700 Views

Hi, thank you so much for your help.

 

max_work_group_size is 1024

max_work_item_sizes is (64, 1024, 1024)

 

With my kernel:

 

cgh.parallel_for(
sycl::nd_range<3>(sycl::range<3>(1, 1, blocks) *
sycl::range<3>(1, 1, threads),
sycl::range<3>(1, 1, threads)),
[=](sycl::nd_item<3> item_ct1) {
   kernel(...);
});
 
How can I do the division that you said ? Because if I cant do 1024/1024. Sorry
0 Kudos
ShwethaS_Intel
Moderator
2,682 Views

Hi,

 

Small correction to our previous understanding.

 

Max work group size = Maximum number of threads allowed per block.

Max work item size    = Maximum number of threads allowed in each dimensions.

 

This implies that at any given moment, the maximum number of threads within the work group shouldn't exceed max work group size.

 

To calculate maximum number of blocks per grid,

Max no. of blocks = Max Threads / threads requested by user.

And Maximum threads can be calculated by,

Max Threads = Max compute unit * max work group size

 

For further more details please refer the code below and attached output snapshot for both GPU and CPU device.

 

 

#include <CL/sycl.hpp>

int main()
{
    sycl::queue q_ct1 = sycl::queue();
    
    auto device = q_ct1.get_device();
    auto max_work_group_size      = device.get_info<cl::sycl::info::device::max_work_group_size>();
    auto max_work_item_dimensions = device.get_info<cl::sycl::info::device::max_work_item_dimensions>();
    auto max_work_item_sizes      = device.get_info<cl::sycl::info::device::max_work_item_sizes>(); 
    auto max_compute_units        = device.get_info<cl::sycl::info::device::max_compute_units>();
    
         
    std::string d_name = device.get_info<cl::sycl::info::device::name>();
    
    std::cout << "Device: " << d_name << std::endl;
    std::cout << "Max work Group size      = " << max_work_group_size << std::endl;
    std::cout << "Max work Item dimensions = " << max_work_item_dimensions << std::endl;
    std::cout << "Max work Item size       = " << max_work_item_sizes[0] << " " << max_work_item_sizes[1] << " " << 
        max_work_item_sizes[2] << std::endl;
    std::cout << "Max Compute Units        = " << max_compute_units << std::endl;
    
    int requested_threads = 256;
    
    int max_threads = max_compute_units * max_work_group_size;
    int max_blocks = max_threads / requested_threads;
    
    std::cout << std::endl;
    std::cout << "Max threads allowed per block = " << max_work_group_size << std::endl;
    std::cout << "Max blocks allowed per grid = " << max_blocks << " (at a given inst, when " << 
                requested_threads << " are requested per block)" << std::endl;
    
    return 0;
}

 

ShwethaS_Intel_0-1643879577207.png

 

 

Thanks & Regards,

Shwetha.

 

0 Kudos
ManuelCostanzo2
2,661 Views

Hi @ShwethaS_Intel, thank you so much for your help.

 

The only doubt I have is about "max_compute_units". In GPUs, is this number related on what we are working? Because I read about this and it seems like is the number of SM in GPUs, so it's not related to threads or blocks, is that right? If so, how could we modify the code to match for both GPU and CPU? 

 

Thank you again

0 Kudos
ShwethaS_Intel
Moderator
2,627 Views

Hi @ManuelCostanzo2 ,

 

>> The only doubt I have is about "max_compute_units".

YES, Max_compute_unit is equivalent to number of SM's in GPU and it is required to calculate the Maximum number of threads.

 

>> how could we modify the code to match for both GPU and CPU? 

It's up to the description of the user, when launching particular threads, query about the device info to verify appropriate number of threads/blocks to be launched and then set the limit, this way the code can be modified for both CPU and GPU.

 

Hope these details will help you to resolve your queries.

 

Thanks & Regards,

Shwetha.

 

 

0 Kudos
ShwethaS_Intel
Moderator
2,581 Views

Hi,


Has the information provided helped?

If this resolves your issue, make sure to accept this as a solution. Thank you!


Regards,

Shwetha.


0 Kudos
ShwethaS_Intel
Moderator
2,543 Views

Hi,

 

I have not heard back from you. This thread will no longer be monitored by Intel.

If you need further assistance, please post a new question.

 

Thanks & Regards,

Shwetha.






0 Kudos
Reply