I'm thinking _mm_cmpeq_epi8, _mm_movemask_epi8 and then dispatch through a table of pointers to series of _mm_extract_epi8. But I'm not sure this approach will be beneficial compared to a regular std::find_if in memory.
Ravi,
Can you describe in more detail (widen the scope of your description) of what you are trying to do?
IOW are you intending to squish out the 0's of 4 x _mm_epi8 registers (to left, to right, across registers), or is this a squish of a larger number of bytes in memory?
Jim Dempsey
For more complete information about compiler optimizations, see our Optimization Notice.