topic There certainly have been CPU in Intel® Moderncode for Parallel Architectures

Fast-string operation and Non-temporal access

JWong19 — Sat, 13 Aug 2016 06:16:40 GMT

Dear Experts,

We have fast-string operation (REP MOVSB/STOSB) and non-temporal access (NTA) in modern hardware (CPU). Which one do you prefer for memory copy/fill (without considering other DMA resources in the system)?

Best Regards,

Jeremy

There certainly have been CPU

TimP — Mon, 15 Aug 2016 15:11:10 GMT

There certainly have been CPU models where the built-in string moves didn't qualify as "fast," so Intel compiler developers devoted significant effort to make their compilers choose well. More recent CPUs were designed to overcome performance deficits associated with legacy choices. It's reasonable to hope that clearly written portable source will be optimized adequately until performance profiling shows otherwise.

Intel also devoted effort to fix obvious deficiencies in memmove/memcpy/memset provided by OS so there aren't so many problems there as in the past. When using compilers other than Intel's, you may need to call such functions explicitly if you wish to engage automatic run-time selection of streaming/nontemporal store.

>>...Which one do you prefer

SergeyKostrov — Wed, 14 Sep 2016 23:55:49 GMT

>>...Which one do you prefer for memory copy/fill (without considering other DMA resources in the system)? Let me answer in as generic as possible way... 1. If in a Use Case A the function A is faster than function B then use function A. 2. If in a Use Case B the function B is faster than function A then use function B. 3. If some function is always faster than another one then use that function. and so on.

If the strings are indeed

jimdempseyatthecove — Thu, 15 Sep 2016 12:38:56 GMT

If the strings are indeed byte strings at arbitrary byte offsets in both source and destinaton, and if the strings are relatively short, rough guess of less than 256 bytes, then the rep movsb/stosb (questionably) may be a good choice. You will have to run some tests. And because the tests are to be used for you to make a decision, be sure that your tests are set up to provide representative results for the situations you encounter (IOW not a contrived prove your point test).

FWIW I do agree that some optimization effort should be made to favor rep movsb/stosb over using "gobs" of registers (and avoid save/restore or discarding values).

Jim Dempsey