- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The documentation for _mm256_blend_epi16 doesn't indicate that it operates on individual 128-bit channels, but this is the behavior I am seeing. Is this the correct behavior? Here is a reproducer code below showing the behavior for _mm256_blend_epi16 and _mm256_blend_epi32 where I attempt to insert a value into the first position of a vector using the blend instruction.
#include <stdint.h> #include <stdio.h> #include <immintrin.h> typedef union { __m256i m; int32_t v[8];; } __m256i_32_t; typedef union { __m256i m; int16_t v[16];; } __m256i_16_t; void print_m256i_32(__m256i a) { __m256i_32_t t; t.m = a; printf("{%d,%d,%d,%d,%d,%d,%d,%d}", t.v[0], t.v[1], t.v[2], t.v[3], t.v[4], t.v[5], t.v[6], t.v[7]); } void print_m256i_16(__m256i a) { __m256i_16_t t; t.m = a; printf("{%d,%d,%d,%d,%d,%d,%d,%d," "%d,%d,%d,%d,%d,%d,%d,%d}", t.v[ 0], t.v[ 1], t.v[ 2], t.v[ 3], t.v[ 4], t.v[ 5], t.v[ 6], t.v[ 7], t.v[ 8], t.v[ 9], t.v[10], t.v[11], t.v[12], t.v[13], t.v[14], t.v[15]); } int main(int argc, char **argv) { __m256i a32 = _mm256_set_epi32(1,2,3,4,5,6,7,8); __m256i a16 = _mm256_set_epi16(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16); __m256i z32 = _mm256_set1_epi32(99); __m256i z16 = _mm256_set1_epi16(99); __m256i insert32 = _mm256_blend_epi32(a32, z32, 1); printf("insert32 = _mm256_blend_epi32(a32, z32, 1)\n"); print_m256i_32(insert32); printf("\n"); __m256i insert16 = _mm256_blend_epi16(a16, z16, 1); printf("insert16 = _mm256_blend_epi16(a16, z16, 1)\n"); print_m256i_16(insert16); printf("\n"); return 0; }
The output on my system is the following:
insert32 = _mm256_blend_epi32(a32, z32, 1)
{99,7,6,5,4,3,2,1}
insert16 = _mm256_blend_epi16(a16, z16, 1)
{99,15,14,13,12,11,10,9,99,7,6,5,4,3,2,1}
If this is indeed the case then I must use _mm256_blendv_epi8 to accomplish what I am trying to do using _mm256_blend_epi16, but the latency and throughput are not as good.
Is the documentation then incorrect and this is behaving as intended?
- Tags:
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jeff D. wrote:
The documentation for _mm256_blend_epi16 doesn't indicate that it operates on individual 128-bit channels, but this is the behavior I am seeing. Is this the correct behavior?
Immediate constant parameters of *all* Intel intrinsics are 8-bit long, so _mm256_blend_epi16() can't blend 16 elements individually.
Your doc is incomplete.
I'd recommend to use the "instruction set reference, A-Z" at
https://www-ssl.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
This is the *only* *complete* source of info on Intel instructions/intrinsics.
If you need info on an intrinsic, just find it with Ctrl+F and read the section above it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jeff D. wrote:
The documentation for _mm256_blend_epi16 doesn't indicate that it operates on individual 128-bit channels, but this is the behavior I am seeing.
I'm not sure which documentation you are refering to but I see that the Intrinsics Guide is indeed wrong
at least this source is correct for this intrinsic: https://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-5369B2B5-B1E1-4D96-85AB-2019982667B4.htm
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
bronxzv wrote:
I'm not sure which documentation you are refering to but I see that the Intrinsics Guide is indeed wrong
Perhaps, this should be reported in the dedicated thread.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
andysem wrote:
Quote:
bronxzv wrote:
I'm not sure which documentation you are refering to but I see that the Intrinsics Guide is indeed wrong
Perhaps, this should be reported in the dedicated thread.
done!
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page