- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
The following functions seems to not be available on AVX512.
__m512 _mm512_blendv_ps(__m512 a, __m512 b, __m512 mask)
__m512 _mm512_cmp_ps(__m512 a, __m512 b, int comp)
int _mm512_movemask_ps(__m512 a)
Do they will be available soon ? Or is there an alternative to them ?
Thanks alot !
- Tags:
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
__m512 _mm512_mask_blend_ps (__mmask16 k, __m512 a, __m512 b)
__mmask16 _mm512_cmp_ps_mask (__m512 a, __m512 b, const int imm8)
"_mm512_movemask" has no direct equivalent since AVX-512 masks are stored in special purpose k registers, not in zmm registers, use a simple static cast instead such as :
int IntMask(const __mmask16 &mask) {return (int)mask;}
such casts typically generate no code at compile time since the compiler keep the values in k registers then use instructions such as KORTEST etc. on them
don't miss the Intrinsics Guide om/sites/landingpage/IntrinsicsGuide/ for more details, it is very handy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just perfect. Thanks alot :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For extracting the high bit of each element of an integer vector, use __mmask64 _mm512_movepi8_mask( __m512i )
As well as VPMOVB2M, there are also W/D/Q with the usual epi16/32/64 intrinsics.
For FP, I have used FPCLASSPS, i.e. __mmask16 _mm512_fpclass_ps_mask (__m512 a, int imm8). For example
__mmask16 msk = _mm512_fpclass_ps_mask(dist_minus_one, 0x54); // 0x54 = Negative_Finite | Negative_Infinity | Negative_Zero
This can be slightly more efficient than comparing against zero into a mask (with vpcmpps), because you (or the compiler) doesn't need a zeroed register. Also, you get the sign bit from -0.0, which does not compare less than 0.0. FPCLASS can't differentiate -NaN from +NaN, so if you really just want the high bit of each 32-bit element, use the integer instruction. (IDK if it has any bypass delay for inputs coming from FP math instructions.)

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page