Software Archive
Read-only legacy content
17061 Discussions

Knights Corner Architecture

Qiang_L_
Beginner
437 Views

Hi,

I have a question about the architecture of knights corner. I wonder how many Vector Processing Unit (VPU) within one physical core? Because the hardware can support 4 threads within one core, does it that mean there are four VPUs within one core? or there is only one VPU within one core, and four threads share one VPU? I am not familiar with this, Can someone answer me ?

 

Thank you!

Qiang

0 Kudos
1 Solution
TimP
Honored Contributor III
437 Views

The VPU can issue an instruction from any thread on each clock cycle, but can take an instruction from an individual thread only at an interval of 2 or more cycles.  2 threads could keep the VPU running at 90% of maximum throughput, so if you are creating a simplified model, don't let it become over-optimistic.   Current compilers do an excellent job of generating code which is efficient over the full range of threads per core, so there's little incentive to spend time trying to second-guess it.

Hand-coded MKL functions which take full advantage of 4 threads per core are using at least one thread for data shuffling, rather than driving the VPU.

View solution in original post

0 Kudos
1 Reply
TimP
Honored Contributor III
438 Views

The VPU can issue an instruction from any thread on each clock cycle, but can take an instruction from an individual thread only at an interval of 2 or more cycles.  2 threads could keep the VPU running at 90% of maximum throughput, so if you are creating a simplified model, don't let it become over-optimistic.   Current compilers do an excellent job of generating code which is efficient over the full range of threads per core, so there's little incentive to spend time trying to second-guess it.

Hand-coded MKL functions which take full advantage of 4 threads per core are using at least one thread for data shuffling, rather than driving the VPU.

0 Kudos
Reply