- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
when workgroup size is set to one, I noticed compiler also sets SIMD width to be 1. why is the case? I mean, is it possible to put work items of DIFFERENT workgroups into one SIMD operation, such as lane 0 is for workgroup 0, lane 0 is for workgroup 1, etc?
thanks,
Jeffrey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jeffrey,
Workgroups either map to hardware threads, can span multiple threads or even span EUs, but we don't put two workgroups on the same hardware thread, so the minimum recommended work group size is 8 (or (4,2) or (2, 4) or (8, 1) or (1, 8) in a 2D case - you can figure out 3D case yourself :) ).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Robert,
- by "we don't put two workgroups on the same hardware thread", did you mean "we don't put two workgroups on the same SIMD"?
- In modulate_v2_uchar16(), if workgroup_size=32, then each hardware thread processes (32*16) work items. They are from 16 DIFFERENT work groups. Am I right?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One hardware thread can execute 8 work items (SIMD8), 16 work items (SIMD16) or 32 work items (SIMD32). If you have a work group of size 32 and your code was compiled SIMD8, your workgroup will span 4 hardware threads, in case your code was compiled SIMD16, your work group will span 2 hardware threads, and in case you built your code SIMD32, only one thread is required to fit a workgroup.
What you cannot do is build your code SIMD32 and have a work group of size 16 and expect two workgroups to run on the same thread: doesn't work.
in modulate_v2_uchar16, your code is built SIMD32, so there will be 32 work items on one hardware thread, each work item processing 16 pixels. Don't confuse pixels with work items. One work item occupies one SIMD lane but processes 16 pixels. In this case there are 32 work items on a hardware thread.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page