Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

Knights Corner

sirrida
Beginner
852 Views
Intel finally released the documentation of the instruction set of the announced Xeon Phi which is the new brand name for Knights Corner. The document is downloadable at http://software.intel.com/file/m/44500.
Very interesting to see is that this marvelous CPU features most of the instructions present up to Pentium (incl. x87) but misses MMX, SSE, AVX, and other things like CMOV. On the other hand there is a whopping set of 32 (!) registers of 512 (!) bits each and a bunch of pepped up commands (incl. FMA and scatter/gather) for single/double floats and dword/qword integers IMHO better than AVX2. The omnipresent vector masks ease odd loops and can do other magics as well (similar to PDEP/PEXT). Swizzling/converting is built into most commands. The only thing I'm missing is a direct support for byte and word integer arithmetic but this is only a minor speed penalty.
The new "coprocessor" is announced to have at least 50 cores (probably 62) with 4-fold hyperthreading. 248 threads!
Wow! I'm deeply impressed! Congratulations!

I would really like to experiment with this command set, but unfortunately an update of the Intel Software Development Emulator is not (yet?) announced nor even downloadable. Does anybody know more than me?

Are there any sample programs?

Is it planned that the Knights Corner command set will become part of the mainstream processors?
0 Kudos
8 Replies
Bernard
Valued Contributor I
852 Views
Yes it is very impressive chip.But wiil it be available to the average user(read programmer)?
I hope that Knight Corner will not be reserved for the high-end HPC market.
My bet is the price will be comparable to Nvidia Tesla.
0 Kudos
gaston-hillar
Valued Contributor I
852 Views
I've asked about the Emulator update in another forum specific to the Emulator.
They will probably post an answer here when a new Emulator is launched:
Hope it helps. I'm also waiting for the emulator to include the new instruction set, even if it's just for the high-end HPC market. :)
0 Kudos
bronxzv
New Contributor II
852 Views
MIC has a very different ISA than current x86 CPUs, it's notaproper superset (lack MMX/SSEx/AVX + a lot of scalar x86 instructions added after the P5 Pentium)so I suppose itwill beanother emulator instead ofan updated SDE, otherwise our code will run well on the SDE but will crash on actual MIC hardware
0 Kudos
bronxzv
New Contributor II
852 Views

(duplicate post deleted)

0 Kudos
sirrida
Beginner
852 Views
Since the Knight Corner instruction set does not reuse any instruction I see no reason why the emulator should not simply be enhanced to emulate Knight Corner as well. It would make sense however to have some switches which can actively enable / disable some command sets.
0 Kudos
bronxzv
New Contributor II
852 Views

have some switches which can actively enable / disable some command sets.

it will amount to disabling dozens of flags in several CPUID leaves, note that even very old instructions like CMOV aren't supported by MIC

EDIT: I see that there is already this kind of options inSDE with "-no-avx" and "-no-aes" so it's indeed a possibility
0 Kudos
bronxzv
New Contributor II
852 Views

(duplicate post deleted)

0 Kudos
Bernard
Valued Contributor I
852 Views
Intel HPC solution i.e Xeon CPU and Xeon Phi co-processor when working in tandemmay suffer from memoryload-store latency maybe comparable to the Nvidia solution i.e Xeon and Tesla/Quadro(GPGPU) becuse both of the co-processors will be installed in PCIE slots and will be waiting in spin-loops forsending a newtask to themand CPU will wait for completion of a task.Chipset level integration of PCIE controller and memory controller will reduce round-trip time for control signals between both of them.Xeon/Quadro could be also faster when displaying graphics data when compared to Xeon/Xeon Phi/Dedicated GPU because of data moving latencyto/from Memory-Mapped I/Obetween the CPU-CoProcessor-GPU.But also the addition of dedicated GPU can in theory be faster than Xeon/Quadro combo because of thepossiblity of reducing the burden of graphical-intensive computation from the Quadro whichacts as GPU and GPGPU.
0 Kudos
Reply