In the User and Reference Guide for the Intel C++ Compiler 15.0, the description of the AVX2 intrinsics _mm256_packus_epi16/32 states (relevant words highlighted in bold):
The _mm256_packus_epi16 intrinsic converts 16 packed unsigned word integers from source operands a and b into 32 packed unsigned byte integers. The _mm256_packus_epi32 intrinsic converts eight packed unsigned doubleword integers from the source operands a and b into 16 packed unsigned word integers.
The signedness of the source words/doublewords in the description disagrees with the first sentence of the summary:
Pack signed word/doubleword integers to unsigned byte/word integers and saturates.
Converts 4, 8 or 16 signed word integers from the destination operand (first operand) and 4,
8 or 16 signed word integers from the source operand (second operand) into 8, 16 or 32 unsigned byte integers
Thanks for letting us know about this conflict.
I've checked with our compiler intrinsics expert, the user's guide info is wrong. Here is the details: The inputs to the _mm256_packus_epi16 intrinsic are vectors of signed integers. The significance of this is that negative values, i.e. values in the range [0x8000, 0xFFFF], are saturated to 0. (If the inputs were vectors of unsigned integers, these values would be saturated to 0xFF.) The same applies to _mm256_packus_epi32.
A ticket (DPD200365002) is filed to address the User's guide issue.