strlen with SSE4.2 instructions

wmula · ‎06-07-2008

Hi all, this is my first post

Paper "Inside Intel Next Generation Nehalem Architecture" by Ronak Singhal (SP08_NGMS001_100r_eng.pdf) contains comparison of strlen uses PCMPSTRx instruction and ordinal x86-code. SSE4.2 code looks very nice, but what is approximate speedup?

And why scalar x86 code was used? With SSE2 instructions strlen could also be coded; here is my implementation: http://wmula.republika.pl/proj/sse2string/src/strlen.S. I'm wondering how faster SSE4.2 code is.

BTW what is latency/throughput of PCMPSTRx instructions? Does latency depend on input data or is constant? I didn't find answers in recent manuals.

w.

SHIH_K_Intel · ‎06-19-2008

PCMPxSTRy offers a rich set of capabilities. There are on-going work in developing more tutorial materials. You can expect more information to roll out in the Fall IDF time frame.

SHIH_K_Intel · ‎07-21-2008

For software developers who might be interested in attending Fall IDF (8/19-8/21). There will be sessions on Intel AVX on Wed. (8/20). On Thursday afternoon, there is an in-depthsession on SSE4.2. Additionally, SSE4.2 will be demo'ed in the advanced technology zone on all three days.