Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

RAPL: Intel(R) Xeon(R) Silver 4112 CPU monitoring

abhishek_naik
Beginner
301 Views

Hello Everyone,

 

We would want to monitor power consumption of individual cores and turn off cores completely during specific times of a day. I read a lot of articles on this topic, but have not been able to achieve this objective.

I created a Ubuntu 20.04 and testing this. I do not see any files under ls /sys/class/powercap and do not see any power stats reported under turbostat(its all 0, i also stress tested the CPU using stress-ng but the figures do not change). Below are few details - 

 

# ls /sys/class/powercap
#

 

# turbostat
turbostat version 19.08.31 - Len Brown <lenb@kernel.org>
CPUID(0): GenuineIntel 0x16 CPUID levels; 0x80000008 xlevels; family:model:stepping 0x6:55:4 (6:85:4)
CPUID(1): SSE3 - - - - TSC MSR - - -
CPUID(6): No-APERF, No-TURBO, No-DTS, No-PTM, No-HWP, No-HWPnotify, No-HWPwindow, No-HWPepp, No-HWPpkg, No-EPB
cpu1: MSR_IA32_MISC_ENABLE: 0x00000000 (No-TCC No-EIST No-MWAIT PREFETCH TURBO)
CPUID(7): No-SGX
CPUID(0x16): base_mhz: 0 max_mhz: 0 bus_mhz: 0
cpu1: MSR_MISC_PWR_MGMT: 0x00000000 (ENable-EIST_Coordination DISable-EPB DISable-OOB)
RAPL: inf sec. Joule Counter Range, at 0 Watts
cpu1: MSR_PLATFORM_INFO: 0x80000000
0 * 100.0 = 0.0 MHz max efficiency frequency
0 * 100.0 = 0.0 MHz base frequency
cpu1: MSR_IA32_POWER_CTL: 0x00000000 (C1E auto-promotion: DISabled)
cpu1: MSR_TURBO_RATIO_LIMIT: 0x00000000
cpu1: MSR_TURBO_RATIO_LIMIT1: 0x00000000
cpu1: MSR_CONFIG_TDP_NOMINAL: 0x00000000 (base_ratio=0)
cpu1: MSR_CONFIG_TDP_LEVEL_1: 0x00000000 ()
cpu1: MSR_CONFIG_TDP_LEVEL_2: 0x00000000 ()
cpu1: MSR_CONFIG_TDP_CONTROL: 0x00000000 ( lock=0)
cpu1: MSR_TURBO_ACTIVATION_RATIO: 0x00000000 (MAX_NON_TURBO_RATIO=0 lock=0)
cpu1: MSR_PKG_CST_CONFIG_CONTROL: 0x00000000 (UNlocked, pkg-cstate-limit=0 (pc0), automatic c-state conversion=off)
NSFOD /sys/devices/system/cpu/cpu1/cpufreq/scaling_driver
cpu1: MSR_MISC_FEATURE_CONTROL: 0x00000000 (L2-Prefetch L2-Prefetch-pair L1-Prefetch L1-IP-Prefetch)
cpu0: MSR_RAPL_POWER_UNIT: 0x00000000 (1.000000 Watts, 1.000000 Joules, 0.000977 sec.)
cpu0: MSR_PKG_POWER_INFO: 0x00000000 (0 W TDP, RAPL 0 - 0 W, 0.000000 sec.)
cpu0: MSR_PKG_POWER_LIMIT: 0x00000000 (UNlocked)
cpu0: PKG Limit #1: DISabled (0.000000 Watts, 0.000977 sec, clamp DISabled)
cpu0: PKG Limit #2: DISabled (0.000000 Watts, 0.000977* sec, clamp DISabled)
cpu0: MSR_DRAM_POWER_INFO,: 0x00000000 (0 W TDP, RAPL 0 - 0 W, 0.000000 sec.)
cpu0: MSR_DRAM_POWER_LIMIT: 0x00000000 (UNlocked)
cpu0: DRAM Limit: DISabled (0.000000 Watts, 0.000977 sec, clamp DISabled)
cpu1: MSR_RAPL_POWER_UNIT: 0x00000000 (1.000000 Watts, 1.000000 Joules, 0.000977 sec.)
cpu1: MSR_PKG_POWER_INFO: 0x00000000 (0 W TDP, RAPL 0 - 0 W, 0.000000 sec.)
cpu1: MSR_PKG_POWER_LIMIT: 0x00000000 (UNlocked)
cpu1: PKG Limit #1: DISabled (0.000000 Watts, 0.000977 sec, clamp DISabled)
cpu1: PKG Limit #2: DISabled (0.000000 Watts, 0.000977* sec, clamp DISabled)
cpu1: MSR_DRAM_POWER_INFO,: 0x00000000 (0 W TDP, RAPL 0 - 0 W, 0.000000 sec.)
cpu1: MSR_DRAM_POWER_LIMIT: 0x00000000 (UNlocked)
cpu1: DRAM Limit: DISabled (0.000000 Watts, 0.000977 sec, clamp DISabled)
cpu1: MSR_PKGC3_IRTL: 0x00000000 (NOTvalid, 0 ns)
cpu1: MSR_PKGC6_IRTL: 0x00000000 (NOTvalid, 0 ns)
cpu1: MSR_PKGC7_IRTL: 0x00000000 (NOTvalid, 0 ns)
Package CPU TSC_MHz IRQ SMI CPU%c1 CPU%c6 PkgWatt RAMWatt PKG_% RAM_%
- - 2594 483 0 100.00 0.00 0.00 0.00 0.00 0.00
0 0 2594 274 0 100.00 0.00 0.00 0.00 0.00 0.00
2 1 2594 209 0 100.00 0.00 0.00 0.00 0.00 0.00
Package CPU TSC_MHz IRQ SMI CPU%c1 CPU%c6 PkgWatt RAMWatt PKG_% RAM_%
- - 2594 206 0 100.00 0.00 0.00 0.00 0.00 0.00
0 0 2594 131 0 100.00 0.00 0.00 0.00 0.00 0.00
2 1 2594 75 0 100.00 0.00 0.00 0.00 0.00 0.00
#

 

OS level settings: ( I think it is set correctly)

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.6 LTS
Release: 20.04
Codename: focal

# grep CONFIG_POWERCAP /boot/config-5.4.0-186-generic
CONFIG_POWERCAP=y
# grep CONFIG_INTEL_RAPL /boot/config-5.4.0-186-generic
CONFIG_INTEL_RAPL_CORE=m
CONFIG_INTEL_RAPL=m
#

 

CPU info:

# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 45 bits physical, 48 bits virtual
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz

Stepping: 4
CPU MHz: 2593.906
BogoMIPS: 5187.81
Virtualization: VT-x
Hypervisor vendor: VMware
Virtualization type: full

 

# cpuid -l 1 -1 |grep -v true
CPU:
version information (1/eax):
processor type = primary processor (0)
family = 0x6 (6)
model = 0x5 (5)
stepping id = 0x4 (4)
extended family = 0x0 (0)
extended model = 0x5 (5)
(family synth) = 0x6 (6)
(model synth) = 0x55 (85)

miscellaneous (1/ebx):
process local APIC physical ID = 0x2 (2)
cpu count = 0x1 (1)
CLFLUSH line size = 0x8 (8)
brand index = 0x0 (0)
brand id = 0x00 (0): unknown
feature information (1/edx):
PSN: processor serial number = false
DS: debug store = false
ACPI: thermal monitor and clock ctrl = false
hyper-threading / multi-core supported = false
TM: therm. monitor = false
IA64 = false
PBE: pending break event = false
feature information (1/ecx):
DTES64: 64-bit debug store = false
MONITOR/MWAIT = false
CPL-qualified debug store = false
SMX: safer mode extensions = false
Enhanced Intel SpeedStep Technology = false
TM2: thermal monitor 2 = false
context ID: adaptive or shared L1 data = false
SDBG: IA32_DEBUG_INTERFACE = false
xTPR disable = false
PDCM: perfmon and debug = false
DCA: direct cache access = false
#

 

I checked and there is no clear indication the CPU is not supported.

# cat /sys/devices/cpu/caps/pmu_name
skylake

 

https://web.eece.maine.edu/~vweaver/projects/rapl/rapl_support.html

Name Family Model package PP0 (usually cores) PP1 (usually GPU) DRAM PSys powercap perf_event PAPI
Skylake Server 6 85 Y Y N Y N 4.8??? 4.8 (348c5ac6c7dc11) yes

 

Few links i referred to -

https://askubuntu.com/questions/1401888/modprobe-intel-rapl-common-no-such-device-on-intel-core-11th-i7-dual-boot-syste

https://askubuntu.com/questions/1148528/how-to-enable-kernel-configs-config-powercap-and-config-intel-rapl-in-ubuntu

https://community.intel.com/t5/Processors/RAPL-PP0-supported-but-not-enabled/m-p/1550189

https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/running-average-power-limit-energy-reporting.html

https://www.intel.com/content/www/us/en/support/articles/000055440/processors/intel-xeon-processors.html

https://ark.intel.com/content/www/us/en/ark/products/codename/37572/products-formerly-skylake.html

https://ark.intel.com/content/www/us/en/ark/products/123551/intel-xeon-silver-4112-processor-8-25m-cache-2-60-ghz.html

 

Maybe i am missing something obvious. Or is there some setting that has to be enabled at ESXi level? Please check and let me know.

 

Regards,

Abhi

0 Kudos
1 Reply
McCalpinJohn
Honored Contributor III
184 Views

Intel's RAPL power monitoring facility does not support monitoring of individual cores.  Depending on the processor model you can monitor the power used by *all* cores in a socket, the power used by all the DRAM attached to a socket, the power used by everything in the socket, and for some client processors the power used by uncore devices (like an embedded GPU).

0 Kudos
Reply