Processors
Intel® Processors, Tools, and Utilities
14504 Discussions

AVX512 missing intrinsics(sign)

zhou__jieying
Beginner
1,527 Views

Hello,

The following function seems to not be available on AVX512.

__m512i _mm512_sign_epi16 (__m512i a, __m512i b)

Do they will be available soon ? Or is there an alternative to them ?

Thanks alot !

0 Kudos
2 Replies
McCalpinJohn
Honored Contributor III
1,527 Views

It looks like there is no 512-bit instruction of the same form as the AVX2 VPSIGNW instruction.  

Depending on exactly how you are using the _mm256_sign_epi16() intrinsic, it will take either two or three steps to reproduce the computation with AVX-512BW instructions.   The intrinsic _mm512_cmp_epi16_mask() can be used once for a simple positive/negative compare (against a register of zeros), or twice if you need a second mask to handle zero values separately.    The intrinsics _mm512_mask_mullo_epi16() and _mm512_maskz_mullo_epi16() can be used to perform the negation and/or zeroing as required.

Victoria_Z_Intel
Employee
1,489 Views

Hi there. While @McCalpinJohn  reply is great, I know even better one. Namely mul instructions have high latency and it would be better to avoid them.  

Below is the code for 32 bit sign function, the one implementing non-existing _mm512_sign_epi32 (__m512i a__m512i b), you could easily change it to 16-bits

const __m512i zero = _mm512_setzero_si512();

__mmask16 cmask = _mm512_cmpgt_epi32_mask(b, zero);
__mmask16 cmask0 = _mm512_cmpeq_epi32_mask(b, zero);
__m512i a_ = _mm512_sub_epi32(zero, a); //a negated
a = _mm512_mask_blend_epi32(cmask, a_, a); //negate a if b is negative
res = _mm512_mask_blend_epi32(cmask0, a,zero); //zero a if b is zero

0 Kudos
Reply