I wonder what is the behavior of the KUNPCKBW/KUNPCKWD/KUNPCKDQ instructions. In SDM, the description of the instructions imply that they interleave individual bits of the input registers. This is especially so for KUNPCKWD, for which the description wording is missing the work "masks".
At the same time, the pseudo-code of the operations indicate that the instructions move SRC1 bits above SRC2 without interleaving. Intel Intrinsics Guide also contains this pseudocode.
Based on my previous experience with unpack instructions in SSE and AVX, I would expect the KUNPCK* instructions to interleave individual bits, but in this case the pseudocode is incorrect. Is this the case? If not, it would be better to update the instructions description to make it clear that they do not interleave individual bits.
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing