Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Access to Intel(r) Performance Counter Monitor has denied

Ilan
Beginner
1,799 Views

Trying to run pcm.x (PCM 1.7) Redhat 5.7 BL460 G6 ( Intel Xeon CPU X5670 @ 2.93GHz) it fails (while with KDE 3.5 I'm able to get it running).

Can you please advise about the following message?

(reset doesn't help)

Thanks,

Ilan

./pcm.x 1 -nc -ns

Intel Performance Counter Monitor

Copyright (c) 2009-2011 Intel Corporation

Num cores: 24

Num sockets: 2

Threads per core: 2

Core PMU (perfmon) version: 3

Number of core PMU generic (programmable) counters: 4

Width of generic (programmable) counters: 48 bits

Number of core PMU fixed counters: 3

Width of fixed counters: 48 bits

Nominal core frequency: 2933333326 Hz

Access to Intel Performance Counter Monitor has denied (Performance Monitoring Unit is occupied by other application). Try to stop the application that uses PMU.

Alternatively you can try to reset PMU configuration at your own risk. Try to reset? (y/n)

y

PMU configuration has been reset. Try to rerun the program again.

0 Kudos
11 Replies
Thomas_W_Intel
Employee
1,797 Views
Might it be that the the daemon for ksysguard is still running on the system? The daemon and pcm.x are just different front-ends to the same routines. If one works, the other should work as well. You might want to try to unload the msr kernel module or check with lsof if any program is accessing /dev/cpu/?/msr.

Kind regards
Thomas
0 Kudos
Ilan
Beginner
1,797 Views
Thanks,
Only process accessing the /dev/cpu/?/msr is the pcm.x.
Further using "modprobe -l |grep msr" I couldn't see any module I can corelate with msr
0 Kudos
Roman_D_Intel
Employee
1,799 Views
what is the kernel version on your box? uname -a

Could you try to reset PMU and run ./pcm.x 1 again? Or did you try that already?

Are other tools or drivers using PMU are running? This could be Linux perf, oprofile, VTune, PAPI, etc.

Best regards,
Roman
0 Kudos
Ilan
Beginner
1,793 Views

I have tried reseting it before opening the call - it doesn'tmake a change
Version is Redhat 5.7

cat /etc/redhat-release

Red Hat Enterprise Linux Server release 5.7 (Tikanga)

uname -a

Linux illinrh602 2.6.18-274.12.1.el5 #1 SMP Tue Nov 8 21:37:35 EST 2011 x86_64 x86_64 x86_64 GNU/Linux


Only performance tool running is the HP GlancePlusPAK - I have made sure it is down before running pcm.x

Question - how can I find who is accessing the PMU?
the server HT is enabled - should it make a change?

strace of the pcm.x shows successful open for all /dev/cpu/?/msr and then the failure message appears:

10662 open("/dev/cpu/22/msr", O_RDWR) = 25

10662 open("/dev/cpu/23/msr", O_RDWR) = 26

10662 lseek(3, 206, SEEK_SET) = 206

10662 read(3, "\3\26\1\4\0\f\0\0", 8) = 8

10662 write(1, "Nominal core frequency: 29333333"..., 38) = 38

10662 futex(0x2b2b405a2000, FUTEX_WAKE, 1) = 0

10662 open("/dev/shm/sem.Intel Performance Counter Monitor instance create-destroy lock", O_RDWR|O_NOFOLLOW) = 27

10662 fstat(27, {st_mode=S_IFREG|0755, st_size=32, ...}) = 0

10662 close(27) = 0

10662 open("/dev/shm/sem.Number of running Intel Performance Counter Monitor instances", O_RDWR|O_NOFOLLOW) = 27

10662 fstat(27, {st_mode=S_IFREG|0755, st_size=32, ...}) = 0

10662 mmap(NULL, 32, PROT_READ|PROT_WRITE, MAP_SHARED, 27, 0) = 0x2b2b405a3000

10662 close(27) = 0

10662 futex(0x2b2b405a3000, FUTEX_WAKE, 1) = 0

10662 lseek(3, 911, SEEK_SET) = 911

10662 read(3, "\0\0\0\0\7\0\0\0", 8) = 8

10662 lseek(3, 390, SEEK_SET) = 390

10662 read(3, "\0\0\0\0\0\0\0\0", 8) = 8

10662 lseek(3, 391, SEEK_SET) = 391

10662 read(3, "\0\0\0\0\0\0\0\0", 8) = 8

10662 lseek(3, 392, SEEK_SET) = 392

10662 read(3, "\0\0\0\0\0\0\0\0", 8) = 8

10662 lseek(3, 393, SEEK_SET) = 393

10662 read(3, "\0\0\0\0\0\0\0\0", 8) = 8

10662 lseek(3, 909, SEEK_SET) = 909

10662 read(3, "0\0\0\0\0\0\0\0", 8) = 8

10662 futex(0x2b2b405a2000, FUTEX_WAKE, 1) = 0

10662 write(1, "Access to Intel Performance C"..., 165) = 165

10662 write(1, "Alternatively you can try to res"..., 91) = 91

10662 fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0

10662 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b2b405a4000

10662 read(0, "y\n", 1024) = 2

0 Kudos
Roman_D_Intel
Employee
1,798 Views
HT (enabled/disabled) should not make a difference. Some PMU tools although not running in foreground have background drivers that are loaded all the time which might use PMU. Could you post here a list of loaded drivers ("lsmod" command output))

Thanks,
Roman
0 Kudos
Ilan
Beginner
1,794 Views
lsmod output:

Module Size Used by

vxgms 316848 0

vxglm 271568 0

vxfen 300904 1

gab 273056 5 vxfen

llt 194328 5 gab

nfsd 287464 17

exportfs 38849 1 nfsd

auth_rpcgss 81889 1 nfsd

autofs4 63049 3

hidp 83521 2

freevxfs 47817 0

nfs 293145 8

nfs_acl 36673 2 nfsd,nfs

rfcomm 104681 0

l2cap 89537 10 hidp,rfcomm

bluetooth 118725 5 hidp,rfcomm,l2cap

lockd 101425 3 nfsd,nfs

sunrpc 203273 22 nfsd,auth_rpcgss,nfs,nfs_acl,lockd

be2iscsi 94173 0

ib_iser 68161 0

rdma_cm 68689 1 ib_iser

ib_cm 72809 1 rdma_cm

iw_cm 43465 1 rdma_cm

ib_sa 74953 2 rdma_cm,ib_cm

ib_mad 70757 2 ib_cm,ib_sa

ib_core 104901 6 ib_iser,rdma_cm,ib_cm,iw_cm,ib_sa,ib_mad

ib_addr 41673 1 rdma_cm

iscsi_tcp 50893 0

bnx2i 77665 0

cnic 84457 1 bnx2i

ipv6 436449 91 cnic

xfrm_nalgo 43333 1 ipv6

crypto_api 42945 1 xfrm_nalgo

uio 45777 1 cnic

cxgb3i 64849 0

libcxgbi 91597 1 cxgb3i

cxgb3 215600 1 cxgb3i

libiscsi_tcp 53573 3 iscsi_tcp,cxgb3i,libcxgbi

libiscsi2 77765 7 be2iscsi,ib_iser,iscsi_tcp,bnx2i,cxgb3i,libcxgbi,libiscsi_tcp

scsi_transport_iscsi2 73945 8 be2iscsi,ib_iser,iscsi_tcp,bnx2i,libcxgbi,libiscsi2

scsi_transport_iscsi 35017 1 scsi_transport_iscsi2

dm_round_robin 36801 1

dm_multipath 58713 2 dm_round_robin

scsi_dh 42561 1 dm_multipath

video 53197 0

backlight 39873 1 video

sbs 49921 0

power_meter 47053 0

hwmon 36553 1 power_meter

i2c_ec 38593 1 sbs

i2c_core 57537 1 i2c_ec

dell_wmi 37601 0

wmi 41985 1 dell_wmi

button 40545 0

battery 43849 0

asus_acpi 50917 0

acpi_memhotplug 40517 0

ac 38729 0

parport_pc 62313 0

lp 47121 0

parport 73165 2 parport_pc,lp

sg 70649 0

shpchp 70893 0

bnx2x 944273 0

tpm_tis 48077 0

tpm 50273 1 tpm_tis

hpilo 44497 0

i7core_edac 46793 0

8021q 58449 2 cxgb3,bnx2x

tpm_bios 40897 1 tpm

serio_raw 40517 0

edac_mc 61217 1 i7core_edac

mdio 38465 1 bnx2x

pcspkr 36289 0

dm_raid45 99785 0

dm_message 36289 1 dm_raid45

dm_region_hash 46144 1 dm_raid45

dm_mem_cache 38977 1 dm_raid45

dm_snapshot 52233 0

dm_zero 35265 0

dm_mirror 54737 0

dm_log 44993 3 dm_raid45,dm_region_hash,dm_mirror

dm_mod 102289 40 dm_multipath,dm_raid45,dm_snapshot,dm_zero,dm_mirror,dm_log

qla2xxx 1227393 40

scsi_transport_fc 83145 1 qla2xxx

ata_piix 57541 3

libata 208977 1 ata_piix

sd_mod 56513 25

scsi_mod 199385 13 be2iscsi,ib_iser,iscsi_tcp,bnx2i,libcxgbi,libiscsi2,scsi_transport_iscsi2,scsi_dh,sg,qla2xxx,scsi_transport_fc,lib

ata,sd_mod

ext3 169297 7

jbd 94897 1 ext3

uhci_hcd 57433 0

ohci_hcd 56181 0

ehci_hcd 66381 0

0 Kudos
Roman_D_Intel
Employee
1,793 Views
Thanks for the output.

From the list above I could not see any usual suspects that program PMU. It is not possible to find who is using PMU right now in general.

Last try would be the following steps (in this order):
1. reboot your machine
2. stop mentioned by you perf tool
3. ./pcm.x 1 (answer y to reset PMU)
4. ./pcm.x 1 (starting again)

If this does not help I would like to see some debug output from pcm.x: If you can, in PCM::PMUinUse() function please enable all debug output (now disabled by C++ comment //): "// std::cout" => "std::cout". "make clean" then "make". Then
1. ./pcm.x 1 (answer y to reset PMU)
2. ./pcm.x 1 (starting again)
3. Post here the output of (1) and (2)

Thank you.

Most of the PMU tools do not doPMU-busy check anyway, you can disable it too in the source code: but if this unknown tool will reprogram PMU while PCM is working, the results from PCM will not be reliable.

To disable PMU busy check you can modify the PCM::PMUinUse() function to return false:
[cpp]bool PCM::PMUinUse()
{
   return false;

// ...
}[/cpp]

Best regards,
Roman
0 Kudos
Ilan
Beginner
1,799 Views
Thanks,
below is the output with debug ouptut enaled (reset didn't help..)
Then I commented the call to PMUinUse (the two lines in the if the call and the return)
and it worked.
I then uncommented back - and it is still working for me - see output after the "***" line.

Two more questions if I may
- If I increase the interval - would it provide me for the averages of that interval?
- is there any option to accumulate the results for a long interval and calculate averages?

Thanks,

./pcm.x 1 -ns -nc

Intel Performance Counter Monitor

Copyright (c) 2009-2011 Intel Corporation

Num cores: 24

Num sockets: 2

Threads per core: 2

Core PMU (perfmon) version: 3

Number of core PMU generic (programmable) counters: 4

Width of generic (programmable) counters: 48 bits

Number of core PMU fixed counters: 3

Width of fixed counters: 48 bits

Nominal core frequency: 2933333326 Hz

Core 0 IA32_CR_PERF_GLOBAL_CTRL is 0

Access to Intel Performance Counter Monitor has denied (Performance Monitoring Unit is occupied by other application). Try to stop the application that uses PMU.

Alternatively you can try to reset PMU configuration at your own risk. Try to reset? (y/n)

y

PMU configuration has been reset....

**********************************************************************************

Intel Performance Counter Monitor

Copyright (c) 2009-2011 Intel Corporation

Num cores: 24

Num sockets: 2

Threads per core: 2

Core PMU (perfmon) version: 3

Number of core PMU generic (programmable) counters: 4

Width of generic (programmable) counters: 48 bits

Number of core PMU fixed counters: 3

Width of fixed counters: 48 bits

Nominal core frequency: 2933333326 Hz

Core 0 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 1 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 2 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 3 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 4 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 5 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 6 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 7 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 8 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 9 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 10 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 11 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 12 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 13 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 14 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 15 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 16 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 17 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 18 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 19 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 20 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 21 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 22 IA32_CR_PERF_GLOBAL_CTRL is 700000000

Core 23 IA32_CR_PERF_GLOBAL_CTRL is 700000000

EXEC : instructions per nominal CPU cycle

IPC : instructions per CPU cycle

FREQ : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)

AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks

while in C0-state' (includes Intel Turbo Boost)

L3MISS: L3 cache misses

L2MISS: L2 cache misses (including other core's L2 cache *hits*)

L3HIT : L3 cache hit ratio (0.00-1.00)

L2HIT : L2 cache hit ratio (0.00-1.00)

L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latency

L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00)

READ : bytes read from memory controller (in GBytes)

WRITE : bytes written to memory controller (in GBytes)

Core (SKT) | EXEC | IPC | FREQ | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK | READ | WRITE

------------------------------------------------------------------------------------------------------------

TOTAL * 0.00 0.05 0.02 0.55 27 K 1311 K 0.98 0.00 0.00 0.05 0.01 0.00

Instructions retired: 56 M ; Active cycles: 1062 M ; Time (TSC): 336 Tticks ; C0 (active,non-halted) core residency: 2.76 %

PHYSICAL CORE IPC : 0.11 => corresponds to 2.66 % utilization for cores in active state

Instructions per nominal CPU cycle: 0.00 => corresponds to 0.04 % core utilization over time interval

----------------------------------------------------------------------------------------------

Total QPI incoming data traffic: 6944 K QPI data traffic/Memory controller traffic: 0.58

0 Kudos
Roman_D_Intel
Employee
1,798 Views

Two more questions if I may

- If I increase the interval - would it provide me for the averages of that interval?

- is there any option to accumulate the results for a long interval and calculate averages?

If you increase interval to x seconds, the pcm.x utility will output delta of performance counters for x seconds. To compute average value per second just divide all values by x.

Other mode of collection is to let pcm.x to start a command and then it waits for its completion and outputs the delta of performance counters before the command start and after command finishes. For example:
[bash]./pcm.x "sh ./my_benchmark_script.sh parameter1 parameter2"[/bash]

pcm.x will start my_benchmark_script using shell and output counter deltas when it is done.

Best regards,
Roman
0 Kudos
key_key
Beginner
1,799 Views
Thank you, this is interesting information :)
0 Kudos
Roman_D_Intel
Employee
1,799 Views

Hi,

the reason of such behavior mightalso be a BIOS related issue mentioned in https://lkml.org/lkml/2011/3/24/552


Could you please check the Linux boot log using "dmesg | grep PMU" and see if youobserve a similar message:

"[Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)"

Disabling the PMU busy check in PCM "PCM::PMUinUse() function to return always false" should be a quick workaround for your system. Please let us know if an HP BIOS updateis available and can fix the issue.


Thanks,
Roman

0 Kudos
Reply