Thanks for your interest in Intel products!
First of all, I cannot comment on the usefulness of this specific benchmark for providing an accurate measure of memory bandwidth. Also, each Intel Haswell chip/chipset has different performance characteristics, so you need to consider this when measuring performance.
Secondly, since you originally posted your question in the Intel GPA support forum, I'll comment on your question with respect to the Intel GPA product. While Intel GPA displays various CPU and GPU metrics, the tool provides a snapshot of various metrics at a particular point in time. Therefore it's most useful for identifying performance bottlenecks in a specific application (typically a game or graphics-based application), with the end goal being to improve the performance of that application. In other words, Intel GPA is not the tool to use if you are looking for a single number to characterize a particular platform's performance.
Therefore, I'm going to move your question to another forum that I believe may provide more discussion about your original question.
There are lots of ways to measure memory bandwidth, but the question needs to be more specific.
Are you trying to measure the maximum memory bandwidth that a Haswell processor can sustain under the best circumstances, or are you trying to measure the actual memory bandwidth utilized while running a workload of interest?
There are a variety of memory bandwidth benchmark tests that attempt to deliver the best possible sustained bandwidth. These vary in both methodology and in the transparency of those methodologies. In the Windows world, you are most likely to download a binary executable file with little idea of how "bandwidth" is defined or how the code is implemented. In the Linux world, the STREAM benchmark is widely used. It has the advantage of being available in source code, and the disadvantage of requiring a good compiler to get best results (i.e., gcc does not deliver great results).
From what I can see at the Intel web site, the desktop and mobile Haswell processors support two channels of DDR3 DRAM at either 1333 or 1600 MHz transfer rates. This provides a peak memory bandwidth of either 21.3 or 25.6 GB/s. The 17.54 GB/s quoted above is 82% of the peak for DDR3/1333, which is entirely typical. If the system is actually configured with DDR3/1600, then 17.54 GB/s is only 68% of peak, which is lower than is typically seen with Intel processors.
Dr. McCalpin is a recognized expert on this subject... just in case you had any doubt. And the STREAM benchmark is probably the most widely used memory bw benchmark. There are many memory bw benchmarks and I/we don't have time to run each benchmark on each platform/chip/memory_speed/memory_settings configuration combination.
Dr. McCalpin has answered your first question about determining the maximum memory bandwidth.
For measuring the actual memory bandwidth utilized while running your application, you might want to have a lot at Intel Performance Counter Monitor. It provides the routines to use the processor counters that can report the bandwidth in your system.
In my opinion the main thing is to understand the target of the study
A application (as minimum) consists of
work with memory
work with Flops
Therefore, to correctly determine the value flop/byte (as it does dr McCalpin)
in my opinion is more correct to move in two ways:
1.1 determine the maximum corridor of opportunities on the server via stream and linpack benchmarks
1.2 determine the ratio between the actual performance and actual bandwidth received maximum flop/byte
2 to determine from the source code of application, what relation between the amount of memory work (load/story, and may be snooping) and operations with Flops. This geting theoretical peak flop/byte for your application.
It will allow to estimate how much of the server's capabilities can take your application.
This method so simple and can continue to complicate (PCU, profiling and etc. ), but in my opinion, this the first step.
Сorrect me if I wrong
Just as we are talking about memory bandwidth, I have a question about the Max memory bandwidth parameter in the product specification Intel ark page.
According to the specs, Intel Xeon 2600v4 dual socketed servers have a Max memory bandwidth of 76.8GB/sec.
Is this a per processor bandwidth or total system bandwidth?