Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Any faster memcpy/memset?

missing__zlw
Beginner
1,754 Views
I wonder whether I can have my own implementation of memset/memcpy to beat the build-in version. I am using Intel compiler, linux platform.
I am thinking using SSE, but I am not sure whether Intel compiler already apply it. Also, I am linking with TC-malloc library.

Thanks.
0 Kudos
2 Replies
SHIH_K_Intel
Employee
1,754 Views
you might want to take a look at the implemenationsin latest glibc(2.13). look under sysdeps/x64_64/multiarch.
Your mileage will vary depending on the metrics you choose and the test data sets youmeasure with.
0 Kudos
TimP
Honored Contributor III
1,754 Views
You could use nm to determine which references to memset and memcpy have been replaced by the __intel_fast_ versions from the icc library. There should be no built-in version with icc, unless you mean those __intel_fast_ versions. As the other response indicated, current glibc versions should be good for most purposes as well. I can't see what your choice of malloc would imply; maybe you mean which functions does your non-standard malloc use. Again, nm should be a useful tool.
Apparently, you're not asking about AVX optimizations; those don't have great importance on the Sandy Bridge implementation, since the hardware splits 256-bit moves into 128-bit pieces. The main issue for big aligned memset/memmove strings is the cutover point to nontemporal, which would be application dependent.
0 Kudos
Reply