Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
1136 Discussions

_mm256_blend_epi16 doesn't work as documented

Jeff_D_2
Beginner
1,458 Views

The documentation for _mm256_blend_epi16 doesn't indicate that it operates on individual 128-bit channels, but this is the behavior I am seeing.  Is this the correct behavior?  Here is a reproducer code below showing the behavior for _mm256_blend_epi16 and _mm256_blend_epi32 where I attempt to insert a value into the first position of a vector using the blend instruction.

#include <stdint.h>
#include <stdio.h>

#include <immintrin.h>

typedef union {
    __m256i m;
    int32_t v[8];;
} __m256i_32_t;

typedef union {
    __m256i m;
    int16_t v[16];;
} __m256i_16_t;

void print_m256i_32(__m256i a) {
    __m256i_32_t t;
    t.m = a;
    printf("{%d,%d,%d,%d,%d,%d,%d,%d}",
            t.v[0], t.v[1], t.v[2], t.v[3],
            t.v[4], t.v[5], t.v[6], t.v[7]);
}

void print_m256i_16(__m256i a) {
    __m256i_16_t t;
    t.m = a;
    printf("{%d,%d,%d,%d,%d,%d,%d,%d,"
            "%d,%d,%d,%d,%d,%d,%d,%d}",
            t.v[ 0], t.v[ 1], t.v[ 2], t.v[ 3],
            t.v[ 4], t.v[ 5], t.v[ 6], t.v[ 7],
            t.v[ 8], t.v[ 9], t.v[10], t.v[11],
            t.v[12], t.v[13], t.v[14], t.v[15]);
}

int main(int argc, char **argv)
{
    __m256i a32 = _mm256_set_epi32(1,2,3,4,5,6,7,8);
    __m256i a16 = _mm256_set_epi16(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16);
    __m256i z32 = _mm256_set1_epi32(99);
    __m256i z16 = _mm256_set1_epi16(99);
    __m256i insert32 = _mm256_blend_epi32(a32, z32, 1);
    printf("insert32 = _mm256_blend_epi32(a32, z32, 1)\n");
    print_m256i_32(insert32);
    printf("\n");
    __m256i insert16 = _mm256_blend_epi16(a16, z16, 1);
    printf("insert16 = _mm256_blend_epi16(a16, z16, 1)\n");
    print_m256i_16(insert16);
    printf("\n");
    return 0;
}

The output on my system is the following:

insert32 = _mm256_blend_epi32(a32, z32, 1)
{99,7,6,5,4,3,2,1}
insert16 = _mm256_blend_epi16(a16, z16, 1)
{99,15,14,13,12,11,10,9,99,7,6,5,4,3,2,1}


If this is indeed the case then I must use _mm256_blendv_epi8 to accomplish what I am trying to do using _mm256_blend_epi16, but the latency and throughput are not as good.

Is the documentation then incorrect and this is behaving as intended?

0 Kudos
4 Replies
Vladimir_Sedach
New Contributor I
1,458 Views

Jeff D. wrote:

The documentation for _mm256_blend_epi16 doesn't indicate that it operates on individual 128-bit channels, but this is the behavior I am seeing.  Is this the correct behavior?  



Immediate constant parameters of *all* Intel intrinsics  are 8-bit long, so _mm256_blend_epi16() can't blend 16 elements individually.
Your doc is incomplete.
I'd recommend to use the "
instruction set reference, A-Z" at
https://www-ssl.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
This is the *only* *complete* source of info on Intel instructions/intrinsics.

If you need info on an intrinsic, just find it with Ctrl+F and read the section above it.

 

0 Kudos
bronxzv
New Contributor II
1,458 Views

Jeff D. wrote:

The documentation for _mm256_blend_epi16 doesn't indicate that it operates on individual 128-bit channels, but this is the behavior I am seeing. 

I'm not sure which documentation you are refering to but I see that the Intrinsics Guide is indeed wrong

at least this source is correct for this intrinsic: https://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-5369B2B5-B1E1-4D96-85AB-2019982667B4.htm

 

0 Kudos
andysem
New Contributor III
1,458 Views

bronxzv wrote:

I'm not sure which documentation you are refering to but I see that the Intrinsics Guide is indeed wrong

Perhaps, this should be reported in the dedicated thread.

 

0 Kudos
bronxzv
New Contributor II
1,458 Views

andysem wrote:

Quote:

bronxzv wrote:

 

I'm not sure which documentation you are refering to but I see that the Intrinsics Guide is indeed wrong

 

 

Perhaps, this should be reported in the dedicated thread.

 

done!

0 Kudos
Reply