- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I had been using pcm-memory.x to measure the memory bandwidth on Haswell servers. Recently, I switched to Broadwell server, the memory BW measured from pcm-memory.x are incorrect. I had tried on both Broadwell EX and EP system, none report correct results. Same tests on Haswell EX and EP get good results. Is this a pcm-memory.x issue? How can I get memory bandwidth for Broadwell servers?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Harry,
For I/O operations the Intel DDIO technology might transparently use CPU last level cache instead of always going into DRAM memory. Can you check DDIO hit/miss rate using the "-e" option of pcm-pcie?
Thanks,
Roman
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As far as I see in the latest PCM source code (2.11.1), it uses the events RD_CAS_RANK[x,y] and WR_CAS_RANK[x,y] instead of the more general CAS_COUNT events. But PCM has used these events already in version 2.9 and besides Broadwell also for Haswell systems. The reason cannot be the event selection as you see good results on Haswell. The PCM code covers systems with dual rank memory, so if you have quad rank memory you might miss some counts. For dual rank memory the counts should be accurate (if not a new problem with the event was introduced on Broadwell).
You can either patch the PCM source to use the CAS_COUNT events which are not rank-related or you use another tool, e.g LIKWID (https://github.com/RRZE-HPC/likwid). The memory bandwidth measurements are validated with assembly benchmarks: https://github.com/RRZE-HPC/likwid/wiki/AccuracyBroadwellEP#verification-of-group-mem
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
PCM uses the same generic CAS_COUNT events if the rank monitoring options are not specified (https://github.com/opcm/pcm/blob/master/cpucounters.cpp#L3768). This is the default mode for pcm-memory and pcm.x utilities.
Harry, are you using DIMM rank options of pcm-memory? Since there are only 4 hardware counters one can monitor read and write traffic only from two ranks at the same moment. If your DIMMs have more ranks you might miss the traffic as Thomas stated (but only if you use the rank options).
Thanks,
Roman
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Roman,
I am using DualRank DIMM and when I use the -r parameter, i see activities only on rank 0 and 1.
I did more tests on this and found out this could be DMA related. I had tried 2 benchmarks, iperf and STREAM. Here are my findings:
STREAM: pcm-memory.x reported memory BW matching the STREAM results, ~11GB
iperf: I have a 50Gb NIC, running iperf generated near 50Gb network traffic between 2 systems. (Both are Xeon V4 based). From pcm-pcie.x, I see heavy activities measured on iperf server side, ~10GB read and ~6GB write. But from pcm-memory.x, there are little activities measured for both read and write, total of read and write is ~150MB.
I also tried fio which generated ~3GB DMA traffic. Similar to iperf, I can measured the traffic from pcm-pcie.x, but not pcm-memory.x
Thanks,
Harry
PS. Here is the version I used for the test:
Intel(r) Performance Counter Monitor V2.11 (2016-04-20 12:01:09 +0200 ID=56de28a)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Harry,
For I/O operations the Intel DDIO technology might transparently use CPU last level cache instead of always going into DRAM memory. Can you check DDIO hit/miss rate using the "-e" option of pcm-pcie?
Thanks,
Roman
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I use pcm-memory.x to measure the memory bandwidth on my server. I have CPU E5-2650 v4 @2.20GHz and 8 Dual rank - DIMMS of 16 GB each.
When I run "pcm-memory.x -rank=0 -rank=1" i see traffic in channels 0 and 1 for both ranks as expected. When I run "pcm-memory.x -rank=2 -rank=3" i don't see any traffic in channels 0 and 1 for both ranks as expected. When I run "pcm-memory.x -rank=4 -rank=5" i see traffic in channels 0 and 1 for both ranks. Is this right since i have dual rank DIMMS?
Thanks a lot
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page