- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Additionally:
Timing, in addition to interleaving considerations, also depends on if the entire net is:
a) contained in registers
b) contained in L1 cache
c) contained in L2 cache
d) contained in L3/LL cache
e) permutations of above
f) and most importantly: if all the XNORs are performed in the same bit position within each and between each 512-bit vectors .OR. arbitrarily placed inter/intra 512-bit vectors.
Jim Dempsey
링크가 복사됨
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
http://www.agner.org/optimize/instruction_tables.pdf
has some useful timing information. XOR on KNL shows latency of 2, reciprocal throughput 0.5. If you want an XNOR you will have to NOT the result. For performance, you would want to interleave the XOR and NOT with other instruction(s).
Jim Dempsey
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
AVX-512 includes XOR operations, see https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=xor&techs=AVX_512 for instance.
You need also to consider the difference between the instruction latency and throughput. (The statement that "the CPU can perform one XOR per cycle" is likely a statement about the throughput, i.e. when you have a lot of them they come out one per cycle, not the latency [the time from a specific one starting to it ending]).
Intel Architecture Code Analyzer can show you throughput of small code-sequences on different Intel micro-architectures if you want to go that deep...
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Additionally:
Timing, in addition to interleaving considerations, also depends on if the entire net is:
a) contained in registers
b) contained in L1 cache
c) contained in L2 cache
d) contained in L3/LL cache
e) permutations of above
f) and most importantly: if all the XNORs are performed in the same bit position within each and between each 512-bit vectors .OR. arbitrarily placed inter/intra 512-bit vectors.
Jim Dempsey