main memory memcpy performance on Haswells

Software Tuning, Performance Optimization & Platform Monitoring

Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

main memory memcpy performance on Haswells

1,002 Views

I'm using my NetPIPE communication benchmark to measure the copy rate for various array sizes from main memory (no cache effects). On sandy/ivy bridge I see nice curves for both gcc and icc _intel_fast_memcpy but on Haswells the performance is significantly lower in the mid-range for both between around 8 kB to 4 MB only achieving decent performance for very large array sizes. To me it seems that the the code is just not tuned for Haswells but it seems odd that I'm seeing the same deficiency for both gcc and icc routines. I've attached a graph showing both curves (these show copy rates, bandwidths would be 2x this). Measurements avoid cache effects by moving the source and destination pointers through a very large memory buffer.

18 KB

Link Copied

0 Replies

Community support is provided Monday to Friday. Other contact methods are available here.

Intel does not verify all solutions, including but not limited to any file transfers that may appear in this community. Accordingly, Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

For more complete information about compiler optimizations, see our Optimization Notice.