Intel finally released the documentation of the instruction set of the announced Xeon Phi which is the new brand name for Knights Corner. The document is downloadable at http://software.intel.com/file/m/44500. Very interesting to see is that this marvelous CPU features most of the instructions present up to Pentium (incl. x87) but misses MMX, SSE, AVX, and other things like CMOV. On the other hand there is a whopping set of 32 (!) registers of 512 (!) bits each and a bunch of pepped up commands (incl. FMA and scatter/gather) for single/double floats and dword/qword integers IMHO better than AVX2. The omnipresent vector masks ease odd loops and can do other magics as well (similar to PDEP/PEXT). Swizzling/converting is built into most commands. The only thing I'm missing is a direct support for byte and word integer arithmetic but this is only a minor speed penalty. The new "coprocessor" is announced to have at least 50 cores (probably 62) with 4-fold hyperthreading. 248 threads! Wow! I'm deeply impressed! Congratulations!
I would really like to experiment with this command set, but unfortunately an update of the Intel Software Development Emulator is not (yet?) announced nor even downloadable. Does anybody know more than me?
Are there any sample programs?
Is it planned that the Knights Corner command set will become part of the mainstream processors?
Yes it is very impressive chip.But wiil it be available to the average user(read programmer)? I hope that Knight Corner will not be reserved for the high-end HPC market. My bet is the price will be comparable to Nvidia Tesla.
MIC has a very different ISA than current x86 CPUs, it's notaproper superset (lack MMX/SSEx/AVX + a lot of scalar x86 instructions added after the P5 Pentium)so I suppose itwill beanother emulator instead ofan updated SDE, otherwise our code will run well on the SDE but will crash on actual MIC hardware
Since the Knight Corner instruction set does not reuse any instruction I see no reason why the emulator should not simply be enhanced to emulate Knight Corner as well. It would make sense however to have some switches which can actively enable / disable some command sets.
Intel HPC solution i.e Xeon CPU and Xeon Phi co-processor when working in tandemmay suffer from memoryload-store latency maybe comparable to the Nvidia solution i.e Xeon and Tesla/Quadro(GPGPU) becuse both of the co-processors will be installed in PCIE slots and will be waiting in spin-loops forsending a newtask to themand CPU will wait for completion of a task.Chipset level integration of PCIE controller and memory controller will reduce round-trip time for control signals between both of them.Xeon/Quadro could be also faster when displaying graphics data when compared to Xeon/Xeon Phi/Dedicated GPU because of data moving latencyto/from Memory-Mapped I/Obetween the CPU-CoProcessor-GPU.But also the addition of dedicated GPU can in theory be faster than Xeon/Quadro combo because of thepossiblity of reducing the burden of graphical-intensive computation from the Quadro whichacts as GPU and GPGPU.