Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

SSE2 strlen faster than SSE4.2 strlen

styc
Beginner
438 Views
Test program:
[cpp]#include 
#include 
#include 

#define N 2000000000

char s[N + 1];

long read_time() {
    struct timeval tv;
    gettimeofday(&tv, NULL);
    return tv.tv_sec * 1000000l + tv.tv_usec;
}

int main() {
    memset(s, 'a', N);
    long t0 = read_time();
    int l = strlen(s);
    long t1 = read_time();
    printf("len=%d time=%ldusn", l, t1 - t0);
    return 0;
}
[/cpp]
"-O2 -xSSE2" outputs "len=2000000000 time=268183us"
"-O2 -xSSE4.2" outputs "len=2000000000 time=286701us"
(icc 11.1.056; Linux amd64; Xeon E5530; DDR3 1067MHz)

The SSE2 version is roughly 6.5% faster. Probably strlen should default to SSE2 code even under -xSSE4.2?
0 Kudos
1 Reply
jimdempseyatthecove
Honored Contributor III
438 Views

To be a fair test, create a series of strings of varying lenght

0 bytes
1 bytes
2 bytes
3 bytes
4 bytes
5 bytes
8 bytes
9bytes

i.e. power of 2 and power of 2+1

Use nested loops varying the outer loop length to provide reasonable run time.

Now compare run times performance as a function of string length.

Jim Dempsey

0 Kudos
Reply