Each thread context (aka

H__Kamil · ‎05-11-2015

Hi,

I have a question about VPU. How many registers phisical the MIC VPU has got? 32 or 128?

TimP · ‎05-11-2015

Each physical thread has 32 vpu registers

H__Kamil · ‎05-11-2015

Ok. I understand it in in this way: VPU have 128 phisycal registers. Each thread have 32 own registers. Am i right?

McCalpinJohn · ‎05-13-2015

Each thread context (aka "logical processor") has its own private set of 32 named vector registers. Since each core supports four thread contexts, there must be (at least) 128 vector registers in each core. This is reasonably clear from the discussion at https://software.intel.com/en-us/articles/intel-xeon-phi-coprocessor-april-2013-developer-webinar-qa-responses (search for "renaming" in the page).

With the current Xeon Phi design, it is probably not possible to need more physical registers than named registers -- the combination of instruction issue rate and pipeline latency is too low. There might still be register renaming to avoid false conflicts, but I don't see a need for the number of physical registers to be larger than the (aggregate) number of named registers. If register renaming is used for this purpose, I can't see any way to determine whether the renaming is from a single set of 128 physical registers or from four sets of 32 physical registers.

With the doubling of issue rate and the doubling of the number of functional units in the next-generation Xeon Phi, it is possible that more than 32 physical registers will be needed to fully tolerate the pipeline latency for a single thread. In that case it may make more sense to rename from a single pool than from separate pools. It is unlikely that Intel will comment on the implementation at this level of detail, unless it happens to be one of the (few) technology features that they choose to highlight at Hot Chips or some other public event.

TimP · ‎05-14-2015

I thought hardware renaming was done to enable out of order instruction execution. That does tend to be associated with longer pipelines and requirements for more registers.

I have seen Mic applications use 24 vpu registers per thread but then they don't need more than 2 threads. So it seems 128 registers are enough for knc.

I find that URL John quoted a bit difficult as it answers questions both about coprocessor and host, as well as being outdated sometimes.

jimdempseyatthecove · ‎05-14-2015

Each hardware thread context on KNC additionally has eight 16-bit mask registers (supporting element widths of 4 and 8 bytes). It is unknown (to me) as to if future generations of AVX512 will support 32 or 64 bit mask registers (to cover all supported element widths (short and byte)).

Jim Dempsey

TimP · ‎05-14-2015

According to what I read, each implementation of avx512 must be a superset of avx512f. So if that has byte wise mask all must.

McCalpinJohn · ‎05-15-2015

Hardware register renaming is done to prevent false conflicts from stalling execution --- see the discussion labelled "anti-dependency" at https://en.wikipedia.org/wiki/Data_dependency, or the discussion of Data Hazards at https://en.wikipedia.org/wiki/Register_renaming . Both of these note that the anti-dependency can be eliminated by a renaming operation. This renaming can be done either in hardware or in software.

False dependencies of this type can occur in either in-order or out-of-order processors. For in-order processors the compiler can often use different register names to avoid the false dependency, but it seems like most compilers assume that the hardware can rename registers, so I don't see this happening very often any more. Most academic computer science references assert that false dependencies are rare in in-order processors, but that depends on lots of assumptions about how long registers are "held" by an instruction. Microarchitectures that are designed to detect and recover from errors at any stage of the pipeline may hold on to registers for a lot longer than a textbook might expect, and this can lead to stalls due to false dependencies.

In any case, for out-of-order processors the false dependencies are much more common because there are a large number of instructions in flight, and all are competing for a limited number of register names. The compiler can still make an effort to remove false dependencies, but it typically does not have enough independent register names to avoid all false conflicts in the out-of-order window.

Intel Xeon Phi - VPU registers