Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
11 Views

Same instruction on all 8 EU?

To get peak performance, all EU in single sub-slice should issue same instruction or in single EU only we need same instruction? At what granularity i should avoid branching ?

 

Thanks and regards,

Biren Doshi

 

0 Kudos
1 Reply
Highlighted
11 Views

The conditional mask is by EU thread.  Each thread can have 1-32 SIMD lanes.

This is lower granularity than by EU.  Each EU typically runs 7 threads.  The 2 FPUs per EU could in theory be saturated by only 2 threads but in practice running 7 means a higher chance of keeping them busy.

For more info, please see section 5.3.5 "SIMD Code Generation for SPMD Programming Models" in the Gen9 compute architecture documentation: https://software.intel.com/sites/default/files/managed/c5/9a/The-Compute-Architecture-of-Intel-Proce...

 

0 Kudos