Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
72 Views

mm256_shuffle_epi8

HI, I am going through the documentation for _mm256_shuffle_epi8 https://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/index.htm#G... pseudo code shows only upto 16 bytes ... for (i = 0; i < 16; i++){ if (b & 0x80){ r = 0; } else { r = a[b & 0x0F]; } } ... Is there an updated document which explains for 32 bytes? Thanks.
0 Kudos
3 Replies
Highlighted
New Contributor I
72 Views

Ravi,

Download:
https://www-ssl.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-soft...

and find the intrinsic (Ctrl-F3).

_mm256_shuffle_epi8() does high order 128-bit permutation using high order 128-bit of all parameters.
The method is same as for low 128-bit.

 

0 Kudos
Highlighted
Beginner
72 Views

Vladimir, Thanks for the reference. what I wanted to achieve using _mm256_shuffle_epi8, swap 0 - 31 1 - 30 2 - 29 ... 31 - 0 I tried _mm256_shuffle_epi8, doesn't seem to get it working. With your explanation, I think I am using it for wrong purpose. Any inputs on which intrinsics I should look at? Thanks, Ravi
0 Kudos
Highlighted
New Contributor I
72 Views

Ravi,

I'm using a code like this:
// "sign" allows to compare unsigned numbers with _mm256_cmpgt_epi32
// after _mm256_shuffle_epi8 we have 15, 14,...0,   31, 30,...16
// after_mm256_permute2f128_si256 we have 31, 30,...16,   15, 14,...0

    __m256i    ff = _mm256_set1_epi32(-1);
    __m256i    idx = _mm256_setr_epi8(
        15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0,
        15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
    __m256i    sign = _mm256_set1_epi32(0x80000000);
    __m256i    v0, v1;
    __m256i    eq, gt0, gt1;

    v0 = _mm256_loadu_si256((__m256i *)a);
    v1 = _mm256_loadu_si256((__m256i *)b);

    eq = _mm256_cmpeq_epi32(v0, v1);
    if (!_mm256_testc_si256(eq, ff))    //not equal
    {
        v0 = _mm256_shuffle_epi8(v0, idx);
        v1 = _mm256_shuffle_epi8(v1, idx);

        v0 = _mm256_xor_si256(v0, sign);
        v1 = _mm256_xor_si256(v1, sign);

        v0 = _mm256_permute2f128_si256(v0, v0, 0x01);
        v1 = _mm256_permute2f128_si256(v1, v1, 0x01);

        gt0 = _mm256_cmpgt_epi32(v0, v1);
        gt1 = _mm256_cmpgt_epi32(v1, v0);

        return _mm256_movemask_ps(_mm256_castsi256_ps(gt0)) - _mm256_movemask_ps(_mm256_castsi256_ps(gt1));
    }

0 Kudos