AVX512 missing intrinsics

Cloyz · ‎11-25-2017

Hello,

The following functions seems to not be available on AVX512.

__m512 _mm512_blendv_ps(__m512 a, __m512 b, __m512 mask)

__m512 _mm512_cmp_ps(__m512 a, __m512 b, int comp)

int _mm512_movemask_ps(__m512 a)

Do they will be available soon ? Or is there an alternative to them ?

Thanks alot !

bronxzv · ‎11-27-2017

__m512 _mm512_mask_blend_ps (__mmask16 k, __m512 a, __m512 b)

__mmask16 _mm512_cmp_ps_mask (__m512 a, __m512 b, const int imm8)

"_mm512_movemask" has no direct equivalent since AVX-512 masks are stored in special purpose k registers, not in zmm registers, use a simple static cast instead such as :

int IntMask(const __mmask16 &mask) {return (int)mask;}

such casts typically generate no code at compile time since the compiler keep the values in k registers then use instructions such as KORTEST etc. on them

don't miss the Intrinsics Guide om/sites/landingpage/IntrinsicsGuide/ for more details, it is very handy

Cloyz · ‎12-04-2017

Just perfect. Thanks alot :)

Peter_Cordes · ‎12-09-2017

For extracting the high bit of each element of an integer vector, use __mmask64 _mm512_movepi8_mask( __m512i )

As well as VPMOVB2M, there are also W/D/Q with the usual epi16/32/64 intrinsics.

For FP, I have used FPCLASSPS, i.e. __mmask16 _mm512_fpclass_ps_mask (__m512 a, int imm8). For example

__mmask16 msk = _mm512_fpclass_ps_mask(dist_minus_one, 0x54); 
  // 0x54 = Negative_Finite | Negative_Infinity | Negative_Zero

This can be slightly more efficient than comparing against zero into a mask (with vpcmpps), because you (or the compiler) doesn't need a zeroed register. Also, you get the sign bit from -0.0, which does not compare less than 0.0. FPCLASS can't differentiate -NaN from +NaN, so if you really just want the high bit of each 32-bit element, use the integer instruction. (IDK if it has any bypass delay for inputs coming from FP math instructions.)