compress intrinsics on MIC

King_Crimson · ‎09-06-2014

I'm working on a project to vectorize our code on MIC using the intrinsics. One critical operation needed is the compress operation:

input={10.0 20.0 30.0 40.0 50.0}

mask={1 0 1 0 1}

output={10 30 50}

But I've found that the avx512 instrinsic _mm512_maskz_compress_ps cannot be compiled. I guess that is because it will only be supported on knights landing. So I'm eager to know if there is any effective workaround, whether any exiting library can perform this compress operation very very fast in parallel (the stl erase-remove is sequential). Thanks for any suggestion.

King_Crimson · ‎09-06-2014

Update: the intrinsic _mm512_mask_packstorelo_ps seems to do the same job.