Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Fast-string operation and Non-temporal access


Dear Experts,

We have fast-string operation (REP MOVSB/STOSB) and non-temporal access (NTA) in modern hardware (CPU). Which one do you prefer for memory copy/fill (without considering other DMA resources in the system)?


Best Regards,


0 Kudos
3 Replies
Black Belt

There certainly have been CPU models where the built-in string moves didn't qualify as "fast," so Intel compiler developers devoted significant effort to make their compilers choose well. More recent CPUs were designed to overcome performance deficits associated with legacy choices.  It's reasonable to hope that clearly written portable source will be optimized adequately until performance profiling shows otherwise.

Intel also devoted effort to fix obvious deficiencies in memmove/memcpy/memset provided by OS so there aren't so many problems there as in the past.  When using compilers other than Intel's, you may need to call such functions explicitly if you wish to engage automatic run-time selection of streaming/nontemporal store.

0 Kudos
Valued Contributor II
>>...Which one do you prefer for memory copy/fill (without considering other DMA resources in the system)? Let me answer in as generic as possible way... 1. If in a Use Case A the function A is faster than function B then use function A. 2. If in a Use Case B the function B is faster than function A then use function B. 3. If some function is always faster than another one then use that function. and so on.
0 Kudos
Black Belt

If the strings are indeed byte strings at arbitrary byte offsets in both source and destinaton, and if the strings are relatively short, rough guess of less than 256 bytes, then the rep movsb/stosb (questionably) may be a good choice. You will have to run some tests. And because the tests are to be used for you to make a decision, be sure that your tests are set up to provide representative results for the situations you encounter (IOW not a contrived prove your point test).

FWIW I do agree that some optimization effort should be made to favor rep movsb/stosb over using "gobs" of registers (and avoid save/restore or discarding values).

Jim Dempsey

0 Kudos