- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The following two code samples achieve the same thing. They search in a string of 8 bit chars and return an index of the first match.
The first way uses a PCMPxSTRx instruction.
int Index = _mm_cmpistri(a, b, _SIDD_CMP_EQUAL_EACH);
The second way uses a PCMPEQ instruction.
__m128i VectorMask = _mm_cmpeq_epi8(a, b);
int Mask = _mm_movemask_epi8(VectorMask);
unsigned long Index {};
_BitScanForward(&Index, Mask);
Which way is better to use? Supposing that I know that the two vectors contain only 'valid characters' (no zeros) and that I am not interested in any manipulations of the result mask.
The first way looks more compact in the high level code, but I feel that it will produce a bigger number of micro-ops when decoded by the CPU, based on its description in the Intel intrinsics guide. So perhaps the second way is more efficient.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are forwarding this issue to the SME.
Please give us the environment details like compiler version, cpuinfo, OS details etc.
So that SME will have more insight and can dig into this use-case.
Warm Regards,
Abhishek
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can I cancel the question? I think I know the answer now.
Please delete the question if it is possible.
Sorry,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No need to delete the question. It would be great if you could post your finding here. thanks!
FYI on the Intel Intrinsics Guide webpage: https://software.intel.com/sites/landingpage/IntrinsicsGuide/ , it's helpful to find all the details for each intrinsic.
Jennifer

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page