Community
cancel
Showing results for 
Search instead for 
Did you mean: 
missing__zlw
Beginner
148 Views

Any faster memcpy/memset?

I wonder whether I can have my own implementation of memset/memcpy to beat the build-in version. I am using Intel compiler, linux platform.
I am thinking using SSE, but I am not sure whether Intel compiler already apply it. Also, I am linking with TC-malloc library.

Thanks.
0 Kudos
2 Replies
SHIH_K_Intel
Employee
148 Views

you might want to take a look at the implemenationsin latest glibc(2.13). look under sysdeps/x64_64/multiarch.
Your mileage will vary depending on the metrics you choose and the test data sets youmeasure with.
TimP
Black Belt
148 Views

You could use nm to determine which references to memset and memcpy have been replaced by the __intel_fast_ versions from the icc library. There should be no built-in version with icc, unless you mean those __intel_fast_ versions. As the other response indicated, current glibc versions should be good for most purposes as well. I can't see what your choice of malloc would imply; maybe you mean which functions does your non-standard malloc use. Again, nm should be a useful tool.
Apparently, you're not asking about AVX optimizations; those don't have great importance on the Sandy Bridge implementation, since the hardware splits 256-bit moves into 128-bit pieces. The main issue for big aligned memset/memmove strings is the cutover point to nontemporal, which would be application dependent.
Reply