Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Intel PCM - How can I watch monitoring results of multi core of AWS CC1?

rewthe
Beginner
1,178 Views
Hi,
Now doing performance test of a certain application on Amazon HPC, CC1 instance, that hasIntel Xeon X5570, quad-core Nehalem by PCM.x.
I could compile and execute PCM.x but the report results were not enough especially multi core.
Reported lines of each core (from 0 to 15) have all values of0.00, -1.00, 0.00, N/A that don't make sense as below.
pcm.x supports Nehalem architecture so I believe that it provides accurate result of mult core.
Is it needed to configure before compiling or executing pcm.x?
Or is this processor unsupported?
Regards,
Ryu
===========================================================================================================
Sample of the results
===========================================================================================================
Intel Performance Counter Monitor
Copyright (c) 2009-2011 Intel Corporation
Num cores: 16
Num sockets: 2
Threads per core: 2
Core PMU (perfmon) version: 0
Number of core PMU generic (programmable) counters: 0
Width of generic (programmable) counters: 0 bits
Nominal core frequency: 2933333326 Hz
Number of PCM instances: 5
EXEC : instructions per nominal CPU cycle
IPC : instructions per CPU cycle
FREQ : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state' (includes Intel Turbo Boost)
L3MISS: L3 cache misses
L2MISS: L2 cache misses (including other core's L2 cache *hits*)
L3HIT : L3 cache hit ratio (0.00-1.00)
L2HIT : L2 cache hit ratio (0.00-1.00)
L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latency
L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00)
READ : bytes read from memory controller (in GBytes)
WRITE : bytes written to memory controller (in GBytes)
Core (SKT) | EXEC | IPC | FREQ | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK | READ | WRITE
0 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
1 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
2 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
3 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
4 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
5 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
6 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
7 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
8 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
9 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
10 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
11 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
12 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
13 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
14 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
15 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
------------------------------------------------------------------------------------------------------------
SKT 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 0.00 0.00
SKT 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 0.00 0.00
------------------------------------------------------------------------------------------------------------
TOTAL * 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 0.00 0.00
Instructions retired: 0 ; Active cycles: 0 ; Time (TSC): 2830 Mticks ; C0 (active,non-halted) core residency: 0.00 %
PHYSICAL CORE IPC : -1.00 => corresponds to -25.00 % utilization for cores in active state
Instructions per nominal CPU cycle: 0.00 => corresponds to 0.00 % core utilization over time interval
Intel QPI data traffic estimation in bytes (data traffic coming to CPU/socket through QPI links):
QPI0 QPI1
----------------------------------------------------------------------------------------------
SKT 0 0 0
SKT 1 0 0
----------------------------------------------------------------------------------------------
Total QPI incoming data traffic: 0 QPI data traffic/Memory controller traffic: nan
0 Kudos
1 Solution
Roman_D_Intel
Employee
1,179 Views
Ryu,

as far as I know Amazon instances are virtualized. Usually virtualization hypervisors forbid direct low-level access to hardware performance counters through model specific registers inside guests. You see invalid values in Intel PCM output because of that.

Thanks,
Roman

View solution in original post

0 Kudos
13 Replies
GHui
Novice
1,179 Views
What's OS?
0 Kudos
Roman_D_Intel
Employee
1,180 Views
Ryu,

as far as I know Amazon instances are virtualized. Usually virtualization hypervisors forbid direct low-level access to hardware performance counters through model specific registers inside guests. You see invalid values in Intel PCM output because of that.

Thanks,
Roman
0 Kudos
GHui
Novice
1,179 Views
Ah, I have that porblem too. MSR in VM isn't supported. Is there any method to resolve this. e.g. Make some changes on VM setting.
0 Kudos
rewthe
Beginner
1,179 Views
Hi Rman,
Thank you for your very quick reply.
OK, I understand your guess and explanation about Amazon EC2 and virtualization.
We cannot use pcm on the purpose of measure performance on virtualized machine...
Is there any other ideas as other said?
0 Kudos
rewthe
Beginner
1,179 Views
Linux, not Windows
0 Kudos
rewthe
Beginner
1,179 Views
You, too?
I DO hope that there are any other ways to solve this situation...
0 Kudos
GHui
Novice
1,179 Views
Er, I haven't describe clearly. Some linux os doesn't load msr by default. You must modprobe it by hand, like sles.
0 Kudos
rewthe
Beginner
1,179 Views
I found /dev/cpu/*/msr but couldn't modprobe them, maybe the reason is on the virtual machine
0 Kudos
GHui
Novice
1,179 Views
You found msr in /dev/cpu/*/, so, you needn't modprobe it.
0 Kudos
rewthe
Beginner
1,179 Views
I thought that current situation was what you told you.
OK, it is no need to modprobe them, so already loaded from the begining.
It is not related to the solution, unfortunately...
0 Kudos
Patrick_F_Intel1
Employee
1,179 Views
Most virtual OS's don't allow access to most MSRs.
If I understand correctly, the hypervisor intercepts rdmsr/wrmsr instructions and only allows access to certain registers. The virtual OS may also spoof the cpuid info, the number of cpus, the PCI devices, etcand cpu-specific tools can fail.
The virtual OS may do these things for both security and virtualization reasons.
Tools like PCM and VTune have a hard time with virtual machines. The tools have code paths for a specific architecture, the virtual OS may want to abstract away the architecture.
This is an area we are trying to improve.
Pat
0 Kudos
rewthe
Beginner
1,179 Views
Thank you everone, I give up to meesure by pcm.x at last this time.
However, if other solusions are found, tell us in order to share your information!
0 Kudos
Thomas_W_Intel
Employee
1,178 Views
As Roman and Pat already explained, any performance tool that requires direct hardware access to theperformance monitoring units (PMU) is blocked by the virtualization layer. This won't be solved unless Amazon introduces a virtualization layer for the PMU, like they do for memory or I/O.

Depending what you are trying to achieve, you might want to try a tool for hotspot analysis that does not require acces to kernel space. The "hotspot" analysis in Intel Amplifier is an example for this (in contrast to the "light-weight hotspot" analysis). "quantify" should also work.
0 Kudos
Reply