Software Archive
Read-only legacy content
17061 讨论

Knights Corner Architecture

Qiang_L_
初学者
439 次查看

Hi,

I have a question about the architecture of knights corner. I wonder how many Vector Processing Unit (VPU) within one physical core? Because the hardware can support 4 threads within one core, does it that mean there are four VPUs within one core? or there is only one VPU within one core, and four threads share one VPU? I am not familiar with this, Can someone answer me ?

 

Thank you!

Qiang

0 项奖励
1 解答
TimP
名誉分销商 III
439 次查看

The VPU can issue an instruction from any thread on each clock cycle, but can take an instruction from an individual thread only at an interval of 2 or more cycles.  2 threads could keep the VPU running at 90% of maximum throughput, so if you are creating a simplified model, don't let it become over-optimistic.   Current compilers do an excellent job of generating code which is efficient over the full range of threads per core, so there's little incentive to spend time trying to second-guess it.

Hand-coded MKL functions which take full advantage of 4 threads per core are using at least one thread for data shuffling, rather than driving the VPU.

在原帖中查看解决方案

0 项奖励
1 回复
TimP
名誉分销商 III
440 次查看

The VPU can issue an instruction from any thread on each clock cycle, but can take an instruction from an individual thread only at an interval of 2 or more cycles.  2 threads could keep the VPU running at 90% of maximum throughput, so if you are creating a simplified model, don't let it become over-optimistic.   Current compilers do an excellent job of generating code which is efficient over the full range of threads per core, so there's little incentive to spend time trying to second-guess it.

Hand-coded MKL functions which take full advantage of 4 threads per core are using at least one thread for data shuffling, rather than driving the VPU.

0 项奖励
回复