- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Now doing performance test of a certain application on Amazon HPC, CC1 instance, that hasIntel Xeon X5570, quad-core Nehalem by PCM.x.
I could compile and execute PCM.x but the report results were not enough especially multi core.
Reported lines of each core (from 0 to 15) have all values of0.00, -1.00, 0.00, N/A that don't make sense as below.
pcm.x supports Nehalem architecture so I believe that it provides accurate result of mult core.
Is it needed to configure before compiling or executing pcm.x?
Or is this processor unsupported?
Regards,
Ryu
===========================================================================================================
Sample of the results
===========================================================================================================
Intel Performance Counter Monitor
Copyright (c) 2009-2011 Intel Corporation
Num cores: 16
Num sockets: 2
Threads per core: 2
Core PMU (perfmon) version: 0
Number of core PMU generic (programmable) counters: 0
Width of generic (programmable) counters: 0 bits
Nominal core frequency: 2933333326 Hz
Number of PCM instances: 5
EXEC : instructions per nominal CPU cycle
IPC : instructions per CPU cycle
FREQ : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state' (includes Intel Turbo Boost)
L3MISS: L3 cache misses
L2MISS: L2 cache misses (including other core's L2 cache *hits*)
L3HIT : L3 cache hit ratio (0.00-1.00)
L2HIT : L2 cache hit ratio (0.00-1.00)
L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latency
L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00)
READ : bytes read from memory controller (in GBytes)
WRITE : bytes written to memory controller (in GBytes)
Core (SKT) | EXEC | IPC | FREQ | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK | READ | WRITE
0 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
1 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
2 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
3 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
4 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
5 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
6 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
7 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
8 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
9 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
10 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
11 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
12 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
13 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
14 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
15 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 N/A N/A
------------------------------------------------------------------------------------------------------------
SKT 0 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 0.00 0.00
SKT 1 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 0.00 0.00
------------------------------------------------------------------------------------------------------------
TOTAL * 0.00 -1.00 0.00 -1.00 0 0 1.00 1.00 -1.00 -1.00 0.00 0.00
Instructions retired: 0 ; Active cycles: 0 ; Time (TSC): 2830 Mticks ; C0 (active,non-halted) core residency: 0.00 %
PHYSICAL CORE IPC : -1.00 => corresponds to -25.00 % utilization for cores in active state
Instructions per nominal CPU cycle: 0.00 => corresponds to 0.00 % core utilization over time interval
Intel QPI data traffic estimation in bytes (data traffic coming to CPU/socket through QPI links):
QPI0 QPI1
----------------------------------------------------------------------------------------------
SKT 0 0 0
SKT 1 0 0
----------------------------------------------------------------------------------------------
Total QPI incoming data traffic: 0 QPI data traffic/Memory controller traffic: nan
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ryu,
as far as I know Amazon instances are virtualized. Usually virtualization hypervisors forbid direct low-level access to hardware performance counters through model specific registers inside guests. You see invalid values in Intel PCM output because of that.
Thanks,
Roman
as far as I know Amazon instances are virtualized. Usually virtualization hypervisors forbid direct low-level access to hardware performance counters through model specific registers inside guests. You see invalid values in Intel PCM output because of that.
Thanks,
Roman
Link Copied
13 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What's OS?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ryu,
as far as I know Amazon instances are virtualized. Usually virtualization hypervisors forbid direct low-level access to hardware performance counters through model specific registers inside guests. You see invalid values in Intel PCM output because of that.
Thanks,
Roman
as far as I know Amazon instances are virtualized. Usually virtualization hypervisors forbid direct low-level access to hardware performance counters through model specific registers inside guests. You see invalid values in Intel PCM output because of that.
Thanks,
Roman
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ah, I have that porblem too. MSR in VM isn't supported. Is there any method to resolve this. e.g. Make some changes on VM setting.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Rman,
Thank you for your very quick reply.
OK, I understand your guess and explanation about Amazon EC2 and virtualization.
We cannot use pcm on the purpose of measure performance on virtualized machine...
Is there any other ideas as other said?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Linux, not Windows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You, too?
I DO hope that there are any other ways to solve this situation...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Er, I haven't describe clearly. Some linux os doesn't load msr by default. You must modprobe it by hand, like sles.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found /dev/cpu/*/msr but couldn't modprobe them, maybe the reason is on the virtual machine
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You found msr in /dev/cpu/*/, so, you needn't modprobe it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I thought that current situation was what you told you.
OK, it is no need to modprobe them, so already loaded from the begining.
It is not related to the solution, unfortunately...
OK, it is no need to modprobe them, so already loaded from the begining.
It is not related to the solution, unfortunately...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Most virtual OS's don't allow access to most MSRs.
If I understand correctly, the hypervisor intercepts rdmsr/wrmsr instructions and only allows access to certain registers. The virtual OS may also spoof the cpuid info, the number of cpus, the PCI devices, etcand cpu-specific tools can fail.
The virtual OS may do these things for both security and virtualization reasons.
Tools like PCM and VTune have a hard time with virtual machines. The tools have code paths for a specific architecture, the virtual OS may want to abstract away the architecture.
This is an area we are trying to improve.
Pat
If I understand correctly, the hypervisor intercepts rdmsr/wrmsr instructions and only allows access to certain registers. The virtual OS may also spoof the cpuid info, the number of cpus, the PCI devices, etcand cpu-specific tools can fail.
The virtual OS may do these things for both security and virtualization reasons.
Tools like PCM and VTune have a hard time with virtual machines. The tools have code paths for a specific architecture, the virtual OS may want to abstract away the architecture.
This is an area we are trying to improve.
Pat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you everone, I give up to meesure by pcm.x at last this time.
However, if other solusions are found, tell us in order to share your information!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As Roman and Pat already explained, any performance tool that requires direct hardware access to theperformance monitoring units (PMU) is blocked by the virtualization layer. This won't be solved unless Amazon introduces a virtualization layer for the PMU, like they do for memory or I/O.
Depending what you are trying to achieve, you might want to try a tool for hotspot analysis that does not require acces to kernel space. The "hotspot" analysis in Intel Amplifier is an example for this (in contrast to the "light-weight hotspot" analysis). "quantify" should also work.
Depending what you are trying to achieve, you might want to try a tool for hotspot analysis that does not require acces to kernel space. The "hotspot" analysis in Intel Amplifier is an example for this (in contrast to the "light-weight hotspot" analysis). "quantify" should also work.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page