_mm256_shuffle_epi8 documentation has incomplete pseudocode

Nathan_Weeks · ‎12-13-2014

The User and Reference Guide for the Intel C++ Compiler 15.0 has incomplete pseudocode for the AVX2 intrinsics _mm256_shuffle_epi8:

https://software.intel.com/en-us/node/524017

for (i = 0; i < 16; i++){
 if (b & 0x80){
  r =  0;
 }
 else
 {
  r = a[b & 0x0F];
 }
}

However, this sets only the lower half of the 256-bit vector. From the description of the corresponding 256-bit VPSHUFB instruction in the Intel 64 and IA-32 Architectures Software Developer's Manual, it appears that one way of expressing pseudocode that sets the upper half of the vector is:

for (i = 0; i < 16; i++){
 if (b & 0x80){
  r =  0;
 }
 else
 {
  r = a[b & 0x0F];
 }
 if (b[16+i] & 0x80){
  r[16+i] =  0;
 }
 else
 {
  r[16+i] = a[16+(b[16+i] & 0x0F)];
}

or more succinctly:

for (i = 0; i < 16; i++){
  r = (b & 0x80) ? 0 : a[b & 0x0F];
  r[16+i] = (b[16+i] & 0x80) ? 0 : a[16+(b[16+i] & 0x0F)];
}

KitturGanesh · ‎12-15-2014

Thanks Nathan, I've filed this issue with the doc group and will update accordingly when the release with the fix is out - appreciate much.

_Kittur