- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The documentation for _mm256_blend_epi16 doesn't indicate that it operates on individual 128-bit channels, but this is the behavior I am seeing. Is this the correct behavior? Here is a reproducer code below showing the behavior for _mm256_blend_epi16 and _mm256_blend_epi32 where I attempt to insert a value into the first position of a vector using the blend instruction.
#include <stdint.h>
#include <stdio.h>
#include <immintrin.h>
typedef union {
__m256i m;
int32_t v[8];;
} __m256i_32_t;
typedef union {
__m256i m;
int16_t v[16];;
} __m256i_16_t;
void print_m256i_32(__m256i a) {
__m256i_32_t t;
t.m = a;
printf("{%d,%d,%d,%d,%d,%d,%d,%d}",
t.v[0], t.v[1], t.v[2], t.v[3],
t.v[4], t.v[5], t.v[6], t.v[7]);
}
void print_m256i_16(__m256i a) {
__m256i_16_t t;
t.m = a;
printf("{%d,%d,%d,%d,%d,%d,%d,%d,"
"%d,%d,%d,%d,%d,%d,%d,%d}",
t.v[ 0], t.v[ 1], t.v[ 2], t.v[ 3],
t.v[ 4], t.v[ 5], t.v[ 6], t.v[ 7],
t.v[ 8], t.v[ 9], t.v[10], t.v[11],
t.v[12], t.v[13], t.v[14], t.v[15]);
}
int main(int argc, char **argv)
{
__m256i a32 = _mm256_set_epi32(1,2,3,4,5,6,7,8);
__m256i a16 = _mm256_set_epi16(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16);
__m256i z32 = _mm256_set1_epi32(99);
__m256i z16 = _mm256_set1_epi16(99);
__m256i insert32 = _mm256_blend_epi32(a32, z32, 1);
printf("insert32 = _mm256_blend_epi32(a32, z32, 1)\n");
print_m256i_32(insert32);
printf("\n");
__m256i insert16 = _mm256_blend_epi16(a16, z16, 1);
printf("insert16 = _mm256_blend_epi16(a16, z16, 1)\n");
print_m256i_16(insert16);
printf("\n");
return 0;
}
The output on my system is the following:
insert32 = _mm256_blend_epi32(a32, z32, 1)
{99,7,6,5,4,3,2,1}
insert16 = _mm256_blend_epi16(a16, z16, 1)
{99,15,14,13,12,11,10,9,99,7,6,5,4,3,2,1}
If this is indeed the case then I must use _mm256_blendv_epi8 to accomplish what I am trying to do using _mm256_blend_epi16, but the latency and throughput are not as good.
Is the documentation then incorrect and this is behaving as intended?
- Tags:
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jeff D. wrote:
The documentation for _mm256_blend_epi16 doesn't indicate that it operates on individual 128-bit channels, but this is the behavior I am seeing. Is this the correct behavior?
Immediate constant parameters of *all* Intel intrinsics are 8-bit long, so _mm256_blend_epi16() can't blend 16 elements individually.
Your doc is incomplete.
I'd recommend to use the "instruction set reference, A-Z" at
https://www-ssl.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
This is the *only* *complete* source of info on Intel instructions/intrinsics.
If you need info on an intrinsic, just find it with Ctrl+F and read the section above it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jeff D. wrote:
The documentation for _mm256_blend_epi16 doesn't indicate that it operates on individual 128-bit channels, but this is the behavior I am seeing.
I'm not sure which documentation you are refering to but I see that the Intrinsics Guide is indeed wrong
at least this source is correct for this intrinsic: https://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-5369B2B5-B1E1-4D96-85AB-2019982667B4.htm
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
bronxzv wrote:
I'm not sure which documentation you are refering to but I see that the Intrinsics Guide is indeed wrong
Perhaps, this should be reported in the dedicated thread.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
andysem wrote:
Quote:
bronxzv wrote:
I'm not sure which documentation you are refering to but I see that the Intrinsics Guide is indeed wrong
Perhaps, this should be reported in the dedicated thread.
done!
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page