- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The User and Reference Guide for the Intel C++ Compiler 15.0 has incomplete pseudocode for the AVX2 intrinsics _mm256_shuffle_epi8:
https://software.intel.com/en-us/node/524017
for (i = 0; i < 16; i++){ if (b & 0x80){ r = 0; } else { r = a[b & 0x0F]; } }
However, this sets only the lower half of the 256-bit vector. From the description of the corresponding 256-bit VPSHUFB instruction in the Intel 64 and IA-32 Architectures Software Developer's Manual, it appears that one way of expressing pseudocode that sets the upper half of the vector is:
for (i = 0; i < 16; i++){ if (b & 0x80){ r = 0; } else { r = a[b & 0x0F]; } if (b[16+i] & 0x80){ r[16+i] = 0; } else { r[16+i] = a[16+(b[16+i] & 0x0F)]; }
or more succinctly:
for (i = 0; i < 16; i++){ r = (b & 0x80) ? 0 : a[b & 0x0F]; r[16+i] = (b[16+i] & 0x80) ? 0 : a[16+(b[16+i] & 0x0F)]; }
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Nathan, I've filed this issue with the doc group and will update accordingly when the release with the fix is out - appreciate much.
_Kittur
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page