- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To get peak performance, all EU in single sub-slice should issue same instruction or in single EU only we need same instruction? At what granularity i should avoid branching ?
Thanks and regards,
Biren Doshi
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The conditional mask is by EU thread. Each thread can have 1-32 SIMD lanes.
This is lower granularity than by EU. Each EU typically runs 7 threads. The 2 FPUs per EU could in theory be saturated by only 2 threads but in practice running 7 means a higher chance of keeping them busy.
For more info, please see section 5.3.5 "SIMD Code Generation for SPMD Programming Models" in the Gen9 compute architecture documentation: https://software.intel.com/sites/default/files/managed/c5/9a/The-Compute-Architecture-of-Intel-Processor-Graphics-Gen9-v1d0.pdf.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page