Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
140 Views

AVX512 missing intrinsics

Hello,

The following functions seems to not be available on AVX512.

__m512 _mm512_blendv_ps(__m512 a, __m512 b, __m512 mask)

 __m512 _mm512_cmp_ps(__m512 a, __m512 b, int comp)

int _mm512_movemask_ps(__m512 a)

Do they will be available soon ? Or is there an alternative to them ?

Thanks alot !

 

0 Kudos
3 Replies
Highlighted
New Contributor II
140 Views

__m512 _mm512_mask_blend_ps (__mmask16 k, __m512 a, __m512 b)

__mmask16 _mm512_cmp_ps_mask (__m512 a, __m512 b, const int imm8)

"_mm512_movemask" has no direct equivalent since AVX-512 masks are stored in special purpose k registers, not in zmm registers, use a simple static cast instead such as :

int IntMask(const __mmask16 &mask) {return (int)mask;}

such casts typically generate no code at compile time since the compiler keep the values in k registers then use instructions such as KORTEST etc. on them

don't miss the Intrinsics Guide om/sites/landingpage/IntrinsicsGuide/ for more details, it is very handy

0 Kudos
Highlighted
Beginner
140 Views

Just perfect. Thanks alot :)

0 Kudos
Highlighted
Beginner
140 Views

For extracting the high bit of each element of an integer vector, use __mmask64 _mm512_movepi8_mask( __m512i )

As well as VPMOVB2M, there are also W/D/Q with the usual epi16/32/64 intrinsics.

For FP, I have used FPCLASSPS, i.e. __mmask16 _mm512_fpclass_ps_mask (__m512 aint imm8). For example 

__mmask16 msk = _mm512_fpclass_ps_mask(dist_minus_one, 0x54); 
  // 0x54 = Negative_Finite | Negative_Infinity | Negative_Zero

 

This can be slightly more efficient than comparing against zero into a mask (with vpcmpps), because you (or the compiler) doesn't need a zeroed register.  Also, you get the sign bit from -0.0, which does not compare less than 0.0.  FPCLASS can't differentiate -NaN from +NaN, so if you really just want the high bit of each 32-bit element, use the integer instruction.  (IDK if it has any bypass delay for inputs coming from FP math instructions.)

0 Kudos