Hi, I have done some research but did not find the information I need on the use of AES-NI instructions with the new Phi. Specifically, I am looking for information on latency, timing and parallelization of the encryption instructions. For instance, how many threads can simultaneously use the encryption instructions? Are the encryption instructions available for each thread or are they shared between threads incurring in a latency for parallel execution?
All of the execution units on the Xeon Phi are shared by the active threads. Pipelined multi-cycle instructions can be executing instructions from different threads in different pipeline stages at the same time, but in each cycle only one thread can issue a micro-op to each individual execution unit.
There is some information on the latency and throughput of the AES instructions on Knights Landing in Agner Fog's wonderful document http://www.agner.org/optimize/instruction_tables.pdf ; -- the information is at the bottom of page 298 in the version that I have (2017-05-02).