AVX512 missing intrinsics(sign)

zhou__jieying · ‎04-18-2019

Hello,

The following function seems to not be available on AVX512.

__m512i _mm512_sign_epi16 (__m512i a, __m512i b)

Do they will be available soon ? Or is there an alternative to them ?

Thanks alot !

McCalpinJohn · ‎05-20-2019

It looks like there is no 512-bit instruction of the same form as the AVX2 VPSIGNW instruction.

Depending on exactly how you are using the _mm256_sign_epi16() intrinsic, it will take either two or three steps to reproduce the computation with AVX-512BW instructions. The intrinsic _mm512_cmp_epi16_mask() can be used once for a simple positive/negative compare (against a register of zeros), or twice if you need a second mask to handle zero values separately. The intrinsics _mm512_mask_mullo_epi16() and _mm512_maskz_mullo_epi16() can be used to perform the negation and/or zeroing as required.

Victoria_Z_Intel · ‎08-05-2020

Hi there. While @McCalpinJohn reply is great, I know even better one. Namely mul instructions have high latency and it would be better to avoid them.

Below is the code for 32 bit sign function, the one implementing non-existing _mm512_sign_epi32 (__m512i a, __m512i b), you could easily change it to 16-bits

const __m512i zero = _mm512_setzero_si512();

__mmask16 cmask = _mm512_cmpgt_epi32_mask(b, zero);
__mmask16 cmask0 = _mm512_cmpeq_epi32_mask(b, zero);
__m512i a_ = _mm512_sub_epi32(zero, a); //a negated
a = _mm512_mask_blend_epi32(cmask, a_, a); //negate a if b is negative
res = _mm512_mask_blend_epi32(cmask0, a,zero); //zero a if b is zero