Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

strlen with SSE4.2 instructions

wmula
Beginner
807 Views
Hi all, this is my first post

Paper "Inside Intel Next Generation Nehalem Architecture" by Ronak Singhal (SP08_NGMS001_100r_eng.pdf) contains comparison of strlen uses PCMPSTRx instruction and ordinal x86-code. SSE4.2 code looks very nice, but what is approximate speedup?

And why scalar x86 code was used? With SSE2 instructions strlen could also be coded; here is my implementation: http://wmula.republika.pl/proj/sse2string/src/strlen.S. I'm wondering how faster SSE4.2 code is.

BTW what is latency/throughput of PCMPSTRx instructions? Does latency depend on input data or is constant? I didn't find answers in recent manuals.

w.
0 Kudos
2 Replies
SHIH_K_Intel
Employee
807 Views
PCMPxSTRy offers a rich set of capabilities. There are on-going work in developing more tutorial materials. You can expect more information to roll out in the Fall IDF time frame.
0 Kudos
SHIH_K_Intel
Employee
807 Views

For software developers who might be interested in attending Fall IDF (8/19-8/21). There will be sessions on Intel AVX on Wed. (8/20). On Thursday afternoon, there is an in-depthsession on SSE4.2. Additionally, SSE4.2 will be demo'ed in the advanced technology zone on all three days.

0 Kudos
Reply