Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

memory bandwith measurements

xiaoyuan_z_
Beginner
1,479 Views

Hi all, How to mesure the Intel haswell CPU's memory bandwith? Is memorytest86+ testing corecort? (17544MB/s) Thanks! Roger

0 Kudos
8 Replies
Neal_Pierman
Valued Contributor I
1,479 Views

Hello,

Thanks for your interest in Intel products!

First of all, I cannot comment on the usefulness of this specific benchmark for providing an accurate measure of memory bandwidth. Also, each Intel Haswell chip/chipset has different performance characteristics, so you need to consider this when measuring performance.

Secondly, since you originally posted your question in the Intel GPA support forum, I'll comment on your question with respect to the Intel GPA product. While Intel GPA displays various CPU and GPU metrics, the tool provides a snapshot of various metrics at a particular point in time. Therefore it's most useful for identifying performance bottlenecks in a specific application (typically a game or graphics-based application), with the end goal being to improve the performance of that application. In other words, Intel GPA is not the tool to use if you are looking for a single number to characterize a particular platform's performance.

Therefore, I'm going to move your question to another forum that I believe may provide more discussion about your original question.

Regards,

Neal

0 Kudos
McCalpinJohn
Honored Contributor III
1,479 Views

There are lots of ways to measure memory bandwidth, but the question needs to be more specific.

Are you trying to measure the maximum memory bandwidth that a Haswell processor can sustain under the best circumstances, or are you trying to measure the actual memory bandwidth utilized while running a workload of interest?

There are a variety of memory bandwidth benchmark tests that attempt to deliver the best possible sustained bandwidth.  These vary in both  methodology and in the transparency of those methodologies.  In the Windows world, you are most likely to download a binary executable file with little idea of how "bandwidth" is defined or how the code is implemented.  In the Linux world, the STREAM benchmark is widely used.  It has the advantage of being available in source code, and the disadvantage of requiring a good compiler to get best results (i.e., gcc does not deliver great results).

From what I can see at the Intel web site, the desktop and mobile Haswell processors support two channels of DDR3 DRAM at either 1333 or 1600 MHz transfer rates.  This provides a peak memory bandwidth of either 21.3 or 25.6 GB/s.     The 17.54 GB/s quoted above is 82% of the peak for DDR3/1333, which is entirely typical.   If the system is actually configured with DDR3/1600, then 17.54 GB/s is only 68% of peak, which is lower than is typically seen with Intel processors. 

0 Kudos
Patrick_F_Intel1
Employee
1,479 Views

Hello Roger,

Dr. McCalpin is a recognized expert on this subject... just in case you had any doubt. And the STREAM benchmark is probably the most widely used memory bw benchmark. There are many memory bw benchmarks and I/we don't have time to run each benchmark on each platform/chip/memory_speed/memory_settings configuration combination.

Pat

0 Kudos
xiaoyuan_z_
Beginner
1,479 Views
Hello Dr. McCalpin & Patrick Fay, Thanks for your Reply. " Are you trying to measure the maximum memory bandwidth that a Haswell processor can sustain under the best circumstances, or are you trying to measure the actual memory bandwidth utilized while running a workload of interest? " My mobo is configured with DDR3/1600. I'm Trying to measure the maximum memory bandwidth and the actual memory bandwidth utilized while running a application, and to see if my application reach the maximum memory bandwidth. Thanks! Roger
0 Kudos
Thomas_W_Intel
Employee
1,479 Views

Roger,

Dr. McCalpin has answered your first question about determining the maximum memory bandwidth.

For measuring the actual memory bandwidth utilized while running your application, you might want to have a lot at Intel Performance Counter Monitor. It provides the routines to use the processor counters that can report the bandwidth in your system.

 

Kind regards

Thomas

0 Kudos
SB17
Beginner
1,479 Views

hi all
In my opinion the main thing is to understand the target of the study
A application (as minimum) consists of
         work with  memory
         work with Flops
Therefore, to correctly determine the value flop/byte (as it does dr McCalpin)  

in my opinion is more correct to move in two ways:
         1.1 determine the maximum corridor of opportunities on the server via stream and linpack benchmarks
                 1.2 determine the ratio between the actual performance and actual bandwidth received maximum flop/byte
         2 to determine from the source code of application, what relation between the amount of memory work (load/story, and may be snooping) and operations with Flops. This geting theoretical peak flop/byte for your application.
        It will allow to estimate how much of the server's capabilities can take your application.
        This method so simple and can continue to complicate (PCU, profiling and etc. ), but in my opinion, this the first step.
Сorrect me if I wrong

Serg

0 Kudos
Abhishek_N_Intel1
1,479 Views

Hello,

Just as we are talking about memory bandwidth, I have a question about the Max memory bandwidth parameter in the product specification Intel ark page.

According to the specs, Intel Xeon 2600v4 dual socketed servers have a Max memory bandwidth of 76.8GB/sec.

Is this a per processor bandwidth or total system bandwidth?

 

 

0 Kudos
McCalpinJohn
Honored Contributor III
1,479 Views
The values on the Intel "ark" web pages are the peak DRAM bandwidth per socket. 2.4 GTransfers/s * 4 memory channels * 8B/Transfer = 76.8 GB/s The "2600" indicates a maximum of 2 sockets, so a fully configured 2-socket system can have a peak local DRAM bandwidth of 153.6 GB/s. Most of the processors in this series can sustain 80% of peak bandwidth if you use enough cores. On the Xeon E5-2690 v4 (14-core), I get 81%-83% of peak DRAM bandwidth with the STREAM benchmark using 7 to 14 cores per socket. Some of the low-power, low-core-count, low-DRAM-frequency processors are likely to deliver a lower fraction of peak DRAM bandwidth, but I have only tested this on some of the Xeon E5-2600 v3 processors, not the v4 versions.
0 Kudos
Reply