Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Inconsistent values of perfevent hardware counters when measuring power consumption of DRAM using RAPL

Kaushik__Pradyumna
486 Views

Hi,

I recently noticed a large inconsistency in the data retrieved from
the perf_event hardware counters pertaining to the power consumption
of DRAM.
This inconsistency was noticed upon a kernel upgrade from 4.4.0 to 4.14.20.

Machine specs,
Dual Socket.
Processor specs: Intel(R) Xeon(R) CPU E5-2650 v3, 2.30 GHz (10 cores with 2 threads per core).
So, overall it's a 20 core, 40 thread machine.
TDP of the machine -- 210 Watts (105 Watts for each Xeon processor)

In my research lab, we use Performance Co-Pilot to measure power
consumption of the CPU and DRAM through the use of perfevent hardware
counters and RAPL.

Old results
Ubuntu Xenial
Kernel Version: 4.4.0
idle power consumption of CPU -- 33.34 Watts
idle power consumption of DRAM -- 14.76 Watts

New results (after kernel upgrade)
Ubuntu Xenial
Kernel Version: 4.14.20
Idle power consumption of CPU -- 33.23 Watts
Idle power consumption of DRAM -- 59.15 Watts

Can anyone give me some insight as to why this is the case?

0 Kudos
3 Replies
Thomas_G_4
New Contributor II
486 Views

The DRAM RAPL domain requires another energy unit as the package domain but only the one of the package domain is readable from some register (6.1E-5). As far as I know the energy unit for the DRAM domain is hard-coded in the kernel sources (something like energy-unit-of-package-domain/4 = 15.3E-6). Your numbers look like the DRAM domain uses the same energy unit as the PKG domain. This is just a guess because the scaling factor for your results between kernel 4.4 and 4.14 is almost exactly 4 (59.15/14.76 = 4.008).

0 Kudos
McCalpinJohn
Honored Contributor III
486 Views

The Xeon E5 v3 datasheet (document 330784) says that the DRAM energy unit on this product is a fixed 15.3 micro-Joules (1/65536 J), independent of the value used for the energy unit for the RAPL package domain.

This fixed DRAM energy unit appears to be carried forward on newer processors, but documentation is sparse (and sometimes incorrect).

It is pretty easy to do a "sanity check" on these numbers.  The maximum power consumption of a DIMM will be in the 4-5 Watt range, so I typically check the power consumption while running STREAM and divide by the number of DIMMs.   If the power per DIMM is in the 16 Watt range, you are using the wrong energy unit -- most likely the one from the RAPL_ENERGY_UNIT MSR.

Idle power is a lot harder to bound.  Obviously it will be less than the maximum, but I have seen very large fluctuations in DRAM power on systems that were "idle".   There are a number of power-saving mechanisms available in DDR4 DRAMs, but it is not easy to identify when these will be used and when they will not be used.   There are performance counters in the memory controller of the uncore that can be used to monitor some of the power-saving modes (POWER_CHANNEL_DLLOFF, POWER_CHANNEL_PPD, POWER_CKE_CYCLES, POWER_SELF_REFRESH), but I have not tried to correlate these with energy consumption measurements.

0 Kudos
Kaushik__Pradyumna
486 Views

Sorry for the late reply.

It turns out that the issue was due to incompatibility between Ubuntu xenial and linux kernel > 4.7.10.

I upgraded my distro to Ubuntu bionic (18.04) and kernel to 4.15.x and now RAPL readings for DRAM package are fine.

 

Thanks for valuable input.

0 Kudos
Reply