Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Measure memory bandwidth on Broadwell EX

Harry_L_
Beginner
1,067 Views

Hello,

I had been using pcm-memory.x to measure the memory bandwidth on Haswell servers. Recently, I switched to Broadwell server, the memory BW measured from pcm-memory.x are incorrect. I had tried on both Broadwell EX and EP system, none report correct results. Same tests on Haswell EX and EP get good results. Is this a pcm-memory.x issue? How can I get memory bandwidth for Broadwell servers?

 

0 Kudos
1 Solution
Roman_D_Intel
Employee
1,067 Views

Hi Harry,

For I/O operations the Intel DDIO technology might transparently use CPU last level cache instead of always going into DRAM memory. Can you check DDIO hit/miss rate using the "-e" option of pcm-pcie?

Thanks,

Roman

View solution in original post

0 Kudos
5 Replies
Thomas_G_4
New Contributor II
1,067 Views

As far as I see in the latest PCM source code (2.11.1), it uses the events RD_CAS_RANK[x,y] and WR_CAS_RANK[x,y] instead of the more general CAS_COUNT events. But PCM has used these events already in version 2.9 and besides Broadwell also for Haswell systems. The reason cannot be the event selection as you see good results on Haswell. The PCM code covers systems with dual rank memory, so if you have quad rank memory you might miss some counts. For dual rank memory the counts should be accurate (if not a new problem with the event was introduced on Broadwell).

You can either patch the PCM source to use the CAS_COUNT events which are not rank-related or you use another tool, e.g LIKWID (https://github.com/RRZE-HPC/likwid). The memory bandwidth measurements are validated with assembly benchmarks: https://github.com/RRZE-HPC/likwid/wiki/AccuracyBroadwellEP#verification-of-group-mem

0 Kudos
Roman_D_Intel
Employee
1,067 Views

Hi,

PCM uses the same generic CAS_COUNT events if the rank monitoring options are not specified (https://github.com/opcm/pcm/blob/master/cpucounters.cpp#L3768). This is the default mode for pcm-memory and pcm.x utilities.

Harry, are you using DIMM rank options of pcm-memory? Since there are only 4 hardware counters one can monitor read and write traffic only from two ranks at the same moment. If your DIMMs have more ranks you might miss the traffic as Thomas stated (but only if you use the rank options).

Thanks,

Roman

0 Kudos
Harry_L_
Beginner
1,067 Views

Hi Roman,

I am using DualRank DIMM and when I use the -r parameter, i see activities only on rank 0 and 1. 

I did more tests on this and found out this could be DMA related. I had tried 2 benchmarks, iperf and STREAM. Here are my findings:

STREAM: pcm-memory.x reported memory BW matching the STREAM results, ~11GB

iperf: I have a 50Gb NIC, running iperf generated near 50Gb network traffic between 2 systems. (Both are Xeon V4 based). From pcm-pcie.x, I see heavy activities measured on iperf server side, ~10GB read and ~6GB write. But from pcm-memory.x, there are little activities measured for both read and write, total of read and write is ~150MB. 

I also tried fio which generated ~3GB DMA traffic. Similar to iperf, I can measured the traffic from pcm-pcie.x, but not pcm-memory.x

Thanks,

Harry

PS. Here is the version I used for the test: 

Intel(r) Performance Counter Monitor V2.11 (2016-04-20 12:01:09 +0200 ID=56de28a)

0 Kudos
Roman_D_Intel
Employee
1,068 Views

Hi Harry,

For I/O operations the Intel DDIO technology might transparently use CPU last level cache instead of always going into DRAM memory. Can you check DDIO hit/miss rate using the "-e" option of pcm-pcie?

Thanks,

Roman

0 Kudos
EBoug
Beginner
1,067 Views

Hello,

I use pcm-memory.x to measure the memory bandwidth on my server. I have CPU E5-2650 v4 @2.20GHz and 8 Dual rank - DIMMS of 16 GB each.

When I run "pcm-memory.x -rank=0 -rank=1" i see traffic in channels 0 and 1 for both ranks as expected. When I run "pcm-memory.x -rank=2 -rank=3" i don't see any traffic in channels 0 and 1 for both ranks as expected. When I run "pcm-memory.x -rank=4 -rank=5" i see traffic in channels 0 and 1 for both ranks. Is this right since i have dual rank DIMMS?

Thanks a lot

0 Kudos
Reply