Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
For the latest information on Intel’s response to the Log4j/Log4Shell vulnerability, please see Intel-SA-00646

mm256_shuffle_epi8

Ravi_K_
Beginner
293 Views
HI, I am going through the documentation for _mm256_shuffle_epi8 https://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/index.htm#G... pseudo code shows only upto 16 bytes ... for (i = 0; i < 16; i++){ if (b & 0x80){ r = 0; } else { r = a[b & 0x0F]; } } ... Is there an updated document which explains for 32 bytes? Thanks.
0 Kudos
3 Replies
Vladimir_Sedach
New Contributor I
293 Views

Ravi,

Download:
https://www-ssl.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-soft...

and find the intrinsic (Ctrl-F3).

_mm256_shuffle_epi8() does high order 128-bit permutation using high order 128-bit of all parameters.
The method is same as for low 128-bit.

 

Ravi_K_
Beginner
293 Views
Vladimir, Thanks for the reference. what I wanted to achieve using _mm256_shuffle_epi8, swap 0 - 31 1 - 30 2 - 29 ... 31 - 0 I tried _mm256_shuffle_epi8, doesn't seem to get it working. With your explanation, I think I am using it for wrong purpose. Any inputs on which intrinsics I should look at? Thanks, Ravi
Vladimir_Sedach
New Contributor I
293 Views

Ravi,

I'm using a code like this:
// "sign" allows to compare unsigned numbers with _mm256_cmpgt_epi32
// after _mm256_shuffle_epi8 we have 15, 14,...0,   31, 30,...16
// after_mm256_permute2f128_si256 we have 31, 30,...16,   15, 14,...0

    __m256i    ff = _mm256_set1_epi32(-1);
    __m256i    idx = _mm256_setr_epi8(
        15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0,
        15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
    __m256i    sign = _mm256_set1_epi32(0x80000000);
    __m256i    v0, v1;
    __m256i    eq, gt0, gt1;

    v0 = _mm256_loadu_si256((__m256i *)a);
    v1 = _mm256_loadu_si256((__m256i *)b);

    eq = _mm256_cmpeq_epi32(v0, v1);
    if (!_mm256_testc_si256(eq, ff))    //not equal
    {
        v0 = _mm256_shuffle_epi8(v0, idx);
        v1 = _mm256_shuffle_epi8(v1, idx);

        v0 = _mm256_xor_si256(v0, sign);
        v1 = _mm256_xor_si256(v1, sign);

        v0 = _mm256_permute2f128_si256(v0, v0, 0x01);
        v1 = _mm256_permute2f128_si256(v1, v1, 0x01);

        gt0 = _mm256_cmpgt_epi32(v0, v1);
        gt1 = _mm256_cmpgt_epi32(v1, v0);

        return _mm256_movemask_ps(_mm256_castsi256_ps(gt0)) - _mm256_movemask_ps(_mm256_castsi256_ps(gt1));
    }

Reply