Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

AVX512 missing intrinsics

Cloyz
Beginner
655 Views

Hello,

The following functions seems to not be available on AVX512.

__m512 _mm512_blendv_ps(__m512 a, __m512 b, __m512 mask)

 __m512 _mm512_cmp_ps(__m512 a, __m512 b, int comp)

int _mm512_movemask_ps(__m512 a)

Do they will be available soon ? Or is there an alternative to them ?

Thanks alot !

 

0 Kudos
3 Replies
bronxzv
New Contributor II
655 Views

__m512 _mm512_mask_blend_ps (__mmask16 k, __m512 a, __m512 b)

__mmask16 _mm512_cmp_ps_mask (__m512 a, __m512 b, const int imm8)

"_mm512_movemask" has no direct equivalent since AVX-512 masks are stored in special purpose k registers, not in zmm registers, use a simple static cast instead such as :

int IntMask(const __mmask16 &mask) {return (int)mask;}

such casts typically generate no code at compile time since the compiler keep the values in k registers then use instructions such as KORTEST etc. on them

don't miss the Intrinsics Guide om/sites/landingpage/IntrinsicsGuide/ for more details, it is very handy

Cloyz
Beginner
655 Views

Just perfect. Thanks alot :)

Peter_Cordes
Beginner
655 Views

For extracting the high bit of each element of an integer vector, use __mmask64 _mm512_movepi8_mask( __m512i )

As well as VPMOVB2M, there are also W/D/Q with the usual epi16/32/64 intrinsics.

For FP, I have used FPCLASSPS, i.e. __mmask16 _mm512_fpclass_ps_mask (__m512 aint imm8). For example 

__mmask16 msk = _mm512_fpclass_ps_mask(dist_minus_one, 0x54); 
  // 0x54 = Negative_Finite | Negative_Infinity | Negative_Zero

 

This can be slightly more efficient than comparing against zero into a mask (with vpcmpps), because you (or the compiler) doesn't need a zeroed register.  Also, you get the sign bit from -0.0, which does not compare less than 0.0.  FPCLASS can't differentiate -NaN from +NaN, so if you really just want the high bit of each 32-bit element, use the integer instruction.  (IDK if it has any bypass delay for inputs coming from FP math instructions.)

Reply