Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

What is int8 and FP16?

GHui
Novice
1,531 Views

I heard that int8 and FP16 from someone, but I don't know what it is.

0 Kudos
4 Replies
TimP
Honored Contributor III
1,531 Views
Your web search engine will give plenty of useful answers. We can't guess what you might ask if you were to be specific. Intel platforms which support such data formats will widen them temporarily when performing arithmetic.
0 Kudos
GHui
Novice
1,531 Views

Does the PMU counter record them? If Intel platforms widen them, do they take use of SSE or AVX? And do they(int8, FP16) calculate much faster?

0 Kudos
TimP
Honored Contributor III
1,531 Views

If there is speedup it would be from saving in memory bandwidth.  A limited group of int8 operations would be available in sse? and avx2.

0 Kudos
McCalpinJohn
Honored Contributor III
1,531 Views

SIMD operations on int8 (byte) variables are supported by MMX, SSE2, AVX, AVX2, and AVX512BW (not shipping yet).  

There is pretty good support for addition/subtraction on packed byte operands:

  • unsigned add/subtract with wraparound,
  • signed add/subtract with saturation, and
  • unsigned add/subtract with saturation.

Bitwise logical operations don't require special versions for byte variables -- you just need to pick a SIMD boolean operation with the right register size.  The same applies for loads and stores, of course.

Boolean operations (e.g., MIN/MAX) are supported for vectors of byte variables by SSE, SSE4_1, AVX2, and AVX512BW, while the bytewise SIMD "compare" operations (e.g., compare for equal, compare for greater than) are supported by MMX, SSE2, AVX, AVX2, and AVX512BW.  There are additional AVX512BW instructions relating to converting the output of compare instructions between bit mask and SIMD register formats.

Shuffle operations on byte variables are supported by SSSE3, AVX, AVX2, and AVX512BW.

Blend operations on byte variables are supported by SSE4_1, AVX, and AVX2.  The special cases of selecting the maximum or minimum byte values in each position of two SIMD values are supported by SSE (unsigned only), SSE4_1, AVX, AVX2, and AVX512BW.

Support for multiplication is trickier, since multiplication of two 1-byte variables produces a 2-byte result.  There is a general instruction to multiply and add vectors of signed and unsigned bytes, truncated the result to a vector of sign-saturated bytes.   This is supported in SSSE3, AVX, AVX2, and AVX512BW.  There is also a specialized instruction to compute the (rounded) average of the corresponding unsigned bytes in two SIMD registers (SSE, SSE2, AVX, AVX2, AVX512BW).

There are a number of specialized operations available for SIMD vectors of byte variables as well.  Some examples include:

  • PSIGNB -- changes sign of destination byte if source byte is negative, zeros destination byte if source byte is zero.  (SSSE3, AVX, AVX2)
  • PABSB -- returns absolute value of each (signed) input byte in SIMD register. (SSSE3, AVX, AVX2, AVX512BW)
  • PSADBW -- computes differences of unsigned bytes in two SIMD registers, then horizontally adds the absolute values of those differences, returning a single 16-bit result.  (SSE, SSE2, AVX, AVX2, AVX512BW)

My mind boggles at the number of transistors that are required to implement these infrequently-used instructions, but that is part of what makes this field continually challenging....

0 Kudos
Reply