- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm working on a project to vectorize our code on MIC using the intrinsics. One critical operation needed is the compress operation:
input={10.0 20.0 30.0 40.0 50.0}
mask={1 0 1 0 1}
output={10 30 50}
But I've found that the avx512 instrinsic _mm512_maskz_compress_ps cannot be compiled. I guess that is because it will only be supported on knights landing. So I'm eager to know if there is any effective workaround, whether any exiting library can perform this compress operation very very fast in parallel (the stl erase-remove is sequential). Thanks for any suggestion.
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Update: the intrinsic _mm512_mask_packstorelo_ps seems to do the same job.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page