GPU Compute Software
Ask questions about Intel® Graphics Compute software technologies, such as OpenCL* GPU driver and oneAPI Level Zero
73 Discussions

A question about subgroup application

Scout
Beginner
1,439 Views

In the artical "Box Blur Filter Using Intel Subgroup Extensions in OpenCL™" : https://www.intel.com/content/www/us/en/developer/articles/technical/box-blur-filter-using-intel-sub...

The author gives an example of how to use subgroup shuffle to share data with work-items. In the chapter "OpenCL Application For Box Blur Filter Using Intel Subgroup Extensions", the author says: "The number of times the kernel is dispatched is less; the work item handles more workload as the kernel now computes for 16 pixels."

Scout_1-1671182249967.png

But in the psudo code the article gives, the step is still 1 pixel:

Scout_2-1671182345487.png

which means, the first work item calculates 16 pixels, as the picture shows:

Scout_3-1671182411121.png

the second work item calculates 16 pixels, like below:

Scout_4-1671182459650.png

It means, the first and the second work-item do a lot of repeated work. Does this waste a lot of FLOPS?

Am I understanding it correctly?

From my understanding, different work-items can share data by using subgroup extension, but one should not let a work-item to do more work than naive implementation. Or else, there should be only one work-item in a subgroup. This "bigger" work-item does more job using registers instead of shared local memory to get higher efficiency.

 

Please correct me if I get this wrong. Thanks a lot!

 

0 Kudos
5 Replies
Scout
Beginner
1,435 Views

Or, subgroup does not mean it could help work-items to share data. It actually means, one work item could load data into registers instead of shared local memory by using subgroup block access commands. In this way, there is only one "real" work item in one subgroup. 

SeshaP_Intel
Moderator
919 Views

Hi,


Thank you for posting in Intel Communities.

Have you tried to run the code at your end?

Could you please provide the following details to investigate the issue more from our end?

1. Hardware details, Graphics card and driver version used.

2. Complete steps you have followed to reproduce the issue.


Thanks and Regards,

Pendyala Sesha Srinivas


SeshaP_Intel
Moderator
817 Views

Hi,


We haven't heard back from you. Could you please provide an update on your issue?


Thanks and Regards,

Pendyala Sesha Srinivas


SeshaP_Intel
Moderator
750 Views

Hi,


We have not heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question. 


Thanks and Regards,

Pendyala Sesha Srinivas


Scout
Beginner
214 Views

hi! Sorry for late reply. I've figured it out. To use subgroup function, a work item is like naive implemention, but the work items in a subgroup could share data. So, the workload for one work item does not become bigger. Please close this thread. Thanks a lot!

Reply