Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

PCM output - why is core utilization over time interval shows lower numbers ?

Prabhu_T_
Beginner
1,273 Views

Hi,

I'm running PCM 2.5 on a Intel Xeon E5-2670, Can anybody please explain the text marked in BOLD in the below PCM output ?. Also can you please explain

1. The difference between core residency and package residency in the below PCM output.

2. If C1 represents core 1(physical) in PCM report where is the information related to C4 and C5 cores. (There are 8 physical cores)

3. The sar command shows 100% user CPU busy for my application, but why does the “% core utilization over time interval” marked bold in the output shows lower numbers. (7.67 %)

PCM output:

Num logical cores: 16
Num sockets: 1
Threads per core: 2
Core PMU (perfmon) version: 3
Number of core PMU generic (programmable) counters: 4
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 2600000000 Hz
Package thermal spec power: 115 Watt; Package minimum power: 51 Watt; Package maximum power: 180 Watt;
EXEC : instructions per nominal CPU cycle
IPC : instructions per CPU cycle
FREQ : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state' (includes Intel Turbo Boost)
L3MISS: L3 cache misses
L2MISS: L2 cache misses (including other core's L2 cache *hits*)
L3HIT : L3 cache hit ratio (0.00-1.00)
L2HIT : L2 cache hit ratio (0.00-1.00)
L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latency
L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00)
READ : bytes read from memory controller (in GBytes)
WRITE : bytes written to memory controller (in GBytes)
TEMP : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature

Core (SKT) | EXEC | IPC | FREQ | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK | READ | WRITE | TEMP

-------------------------------------------------------------------------------------------------------------------
TOTAL * 0.15 1.05 0.15 1.15 511 K 1973 K 0.74 0.96 0.02 0.01 0.00 0.00 N/A

Instructions retired: 6405 M ; Active cycles: 6112 M ; Time (TSC): 2610 Mticks ; C0 (active,non-halted) core residency: 12.68 %

C1 core residency: 87.32 %; C3 core residency: 0.00 %; C6 core residency: 0.00 %; C7 core residency: 0.00 %
C2 package residency: 0.00 %; C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %

PHYSICAL CORE IPC : 2.10 => corresponds to 52.40 % utilization for cores in active state
Instructions per nominal CPU cycle: 0.31 => corresponds to 7.67 % core utilization over time interval

Thanks, Prabhu

0 Kudos
3 Replies
Bernard
Valued Contributor I
1,273 Views

>>>Instructions per nominal CPU cycle: 0.31>>>

It could mean average rate of instructions per clock cycle spread over some time interval which can include Cn power states.

0 Kudos
Roman_D_Intel
Employee
1,273 Views

Hi Prabhu,

1. The difference between core residency and package residency in the below PCM output.

2. If C1 represents core 1(physical) in PCM report where is the information related to C4 and C5 cores. (There are 8 physical cores)

This article explains difference between core and package c-states. Residency is % of time a core/package was in the state.

The n in Cn represents the C state level of a core/package, not the core number.

3. The sar command shows 100% user CPU busy for my application, but why does the “% core utilization over time interval” marked boldin the output shows lower numbers. (7.67 %)

"Sar" shows the portion of time slots that the CPU scheduler in the OS could assign to execution of running programs or the OS itself. This OS-level CPU-utilization metric and its limitations are discussed in the Intel PCM article.

The PCM % core utilization metrics are derived from the core microarchitecture utilization data: the number of instructions retired per (nominal) cycle vs. the maximum number of instructions the core can process in a (nominal) cycle.

Thanks,

Roman

0 Kudos
Bernard
Valued Contributor I
1,273 Views

>>>The PCM % core utilization metrics are derived from the core microarchitecture utilization data: the number of instructions retired per (nominal) cycle vs. the maximum number of instructions the core can process in a (nominal) cycle.>>>

Are these utilization metrics available as a part of VTune documentation?

0 Kudos
Reply