SIMD operations on int8 (byte) variables are supported by MMX, SSE2, AVX, AVX2, and AVX512BW (not shipping yet).
There is pretty good support for addition/subtraction on packed byte operands:
Bitwise logical operations don't require special versions for byte variables -- you just need to pick a SIMD boolean operation with the right register size. The same applies for loads and stores, of course.
Boolean operations (e.g., MIN/MAX) are supported for vectors of byte variables by SSE, SSE4_1, AVX2, and AVX512BW, while the bytewise SIMD "compare" operations (e.g., compare for equal, compare for greater than) are supported by MMX, SSE2, AVX, AVX2, and AVX512BW. There are additional AVX512BW instructions relating to converting the output of compare instructions between bit mask and SIMD register formats.
Shuffle operations on byte variables are supported by SSSE3, AVX, AVX2, and AVX512BW.
Blend operations on byte variables are supported by SSE4_1, AVX, and AVX2. The special cases of selecting the maximum or minimum byte values in each position of two SIMD values are supported by SSE (unsigned only), SSE4_1, AVX, AVX2, and AVX512BW.
Support for multiplication is trickier, since multiplication of two 1-byte variables produces a 2-byte result. There is a general instruction to multiply and add vectors of signed and unsigned bytes, truncated the result to a vector of sign-saturated bytes. This is supported in SSSE3, AVX, AVX2, and AVX512BW. There is also a specialized instruction to compute the (rounded) average of the corresponding unsigned bytes in two SIMD registers (SSE, SSE2, AVX, AVX2, AVX512BW).
There are a number of specialized operations available for SIMD vectors of byte variables as well. Some examples include:
My mind boggles at the number of transistors that are required to implement these infrequently-used instructions, but that is part of what makes this field continually challenging....