Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
4995 Discussions

Perhaps a silly question. VTune can read hardware counters, no other tool can

Brian_V_
Beginner
427 Views

 

When I try to use perf on the new NERSC supercomputer, which us running a linux kernel I get the perfectly understandable

 

perf record -o test.perf -e cpu-cycles,instructions -a sleep 5
Permission error - are you root?
Consider tweaking /proc/sys/kernel/perf_event_paranoid:
 -1 - Not paranoid at all
  0 - Disallow raw tracepoint access for unpriv
  1 - Disallow cpu events for unpriv
  2 - Disallow kernel profiling for unpriv

sure enough when I look at perf_event_parnoid I see "1"  so no hardware counters for me and perf.

 

but when I run VTune, there are hardware counters.

 

to run VTune we log into compute nodes and I can see three kernel modules get added when I ask for a VTune enable allocation

sep3_15               562471  0
vtsspp                364894  0
pax                     4510  0


PaX is a security upgrade that isn't from Intel, but vtsspp and sep3_15 are from Intel.  

 

I can see that sep3_15 is doing something when that kernel module is inserted

 

Creating /dev/sep3_15 base devices with major number 246 ... done.

05 Creating /dev/sep3_15 percpu devices with major number 245 ... done.
06 Setting group ownership of devices to group "vtune" ... done.

so, some new devices and permissions for group "vtune" got made, but I don't know what.  and I can see that the Performance Monitoring Unit (PMU) got hooked in. 

 

I would like to use hardware counters with my own source-code instrumented system in my library.  I used to do this with PAPI, but I get nothing now.  PCM would seem like the right option, but I'm not super-user on this system and it can take a long time to get new software installed.  Can PCM or perf be changed to be in group "vtune" and get access to hardware counters?

 

 

 

 

 

0 Kudos
4 Replies
David_A_Intel1
Employee
427 Views

Hi Brian:

The "vtune" group was used to limit access to the kernel module in the past.  Now, by default, read-write access is granted to everyone, so it isn't really used.

And, unfortunately, it would have no impact on PCM or perf.  I suspect you probably need to get added to something so that you can collect with perf, but I don't know what.  For example, maybe there is some configuration of the perf subsystem where your user or group could be added.

And, pax actually is from Intel Corporation.  See /opt/intel/vtune_amplifier_xe/sepdk/src/pax (default installation path).  It is built as part of the installation and allows multiple versions of the sep driver to co-exist.

0 Kudos
Brian_V_
Beginner
427 Views

 

Awesomely fast response!   This is a new experience for me on developer forums.  Intel is stepping up!

great to find out about PaX.  It has lots of functionality, like turning executable memory to be read-only.  I assumed it was part of a larger Linux effort to create more security.

Is there guidance on how to use PCM or perf on a system where you are not administrator?  I think there might be a group I can get added to, but I was wondering if that is the typical solution at compute centers.

 

 

 

 

0 Kudos
Dmitry_P_Intel1
Employee
427 Views

Hello Brian,

If you interest is your own application then you can remove "-a" and point your application to launch like:

perf record -o test.perf -e cpu-cycles,instructions <my_app>

In this case you will not need to collect system wide that requires special priveledges. But perf will collect needed data from the processors where your app ran. 

And it is interesting do you see any functionality in perf that is missed in VTune to choose perf collection particularly?

Thanks & Regards, Dmitry

 

0 Kudos
Brian_V_
Beginner
427 Views

dmitry-prohorov (Intel) wrote:

Hello Brian,

If you interest is your own application then you can remove "-a" and point your application to launch like:

perf record -o test.perf -e cpu-cycles,instructions <my_app>

In this case you will not need to collect system wide that requires special priveledges. But perf will collect needed data from the processors where your app ran. 

And it is interesting do you see any functionality in perf that is missed in VTune to choose perf collection particularly?

Thanks & Regards, Dmitry

 

The problem I'm having is that VTune is a sampling profiler, with a sample resolution at that is about 2 million instructions long.  If the frequency it turned up VTune itself perverts the results.  so to study a loop of matrix-matrix multiply sized 500 the sampler only visits the triple loop once or twice, and discards the samples and I just see zero counters values.   For short computer science research benchmarks we would like to understand a lot more.  When a working set fits into L3 we would like to verify that we are getting snooping instead of spilling to DRAM.  We have lots of functions that are already instrumented using PAPI, and PAPI is very lightweight, but like PCM you need to be root to see anything.  For short runs the variability from sampling is frustrating. 

I am tunneling deeper into the machine and want to access the DRAM memory controller counters.  VTune can access the uncore memory controller counters.  even though I'm running amplxe-cl as a user I can see these counters.

UNC_M_CAS_COUNT.RD[UNIT0]                           7543781
UNC_M_CAS_COUNT.RD[UNIT1]                           7458169
UNC_M_CAS_COUNT.RD[UNIT2]                                 0
UNC_M_CAS_COUNT.RD[UNIT3]                                 0
UNC_M_CAS_COUNT.RD[UNIT4]                           7542801
UNC_M_CAS_COUNT.RD[UNIT5]                           7454868
UNC_M_CAS_COUNT.RD[UNIT6]                                 0
UNC_M_CAS_COUNT.RD[UNIT7]                                 0
UNC_M_CAS_COUNT.WR[UNIT0]                            533948
UNC_M_CAS_COUNT.WR[UNIT1]                            442308
UNC_M_CAS_COUNT.WR[UNIT2]                                 0
UNC_M_CAS_COUNT.WR[UNIT3]                                 0
UNC_M_CAS_COUNT.WR[UNIT4]                            533921
UNC_M_CAS_COUNT.WR[UNIT5]                            441030
UNC_M_CAS_COUNT.WR[UNIT6]                                 0
UNC_M_CAS_COUNT.WR[UNIT7]                                 0

 

VTune must be getting access through these devices that get installed by the sep3_15 module

/dev/sep3_15> ls -la
total 0
drwxrwxr-x  2 root root    1360 Mar 31 17:25 .
drwxr-xr-x 16 root root   15860 Mar 31 17:25 ..
crw-rw-rw-  1 root root 248,  0 Mar 31 17:25 c
crw-rw-rw-  1 root root 248,  1 Mar 31 17:25 m
crw-rw-rw-  1 root root 247,  0 Mar 31 17:25 s0
crw-rw-rw-  1 root root 247,  1 Mar 31 17:25 s1
crw-rw-rw-  1 root root 247, 10 Mar 31 17:25 s10
crw-rw-rw-  1 root root 247, 11 Mar 31 17:25 s11
crw-rw-rw-  1 root root 247, 12 Mar 31 17:25 s12
crw-rw-rw-  1 root root 247, 13 Mar 31 17:25 s13
crw-rw-rw-  1 root root 247, 14 Mar 31 17:25 s14
crw-rw-rw-  1 root root 247, 15 Mar 31 17:25 s15
crw-rw-rw-  1 root root 247, 16 Mar 31 17:25 s16
crw-rw-rw-  1 root root 247, 17 Mar 31 17:25 s17
crw-rw-rw-  1 root root 247, 18 Mar 31 17:25 s18
crw-rw-rw-  1 root root 247, 19 Mar 31 17:25 s19
crw-rw-rw-  1 root root 247,  2 Mar 31 17:25 s2
crw-rw-rw-  1 root root 247, 20 Mar 31 17:25 s20
crw-rw-rw-  1 root root 247, 21 Mar 31 17:25 s21
crw-rw-rw-  1 root root 247, 22 Mar 31 17:25 s22
crw-rw-rw-  1 root root 247, 23 Mar 31 17:25 s23
crw-rw-rw-  1 root root 247, 24 Mar 31 17:25 s24
crw-rw-rw-  1 root root 247, 25 Mar 31 17:25 s25
crw-rw-rw-  1 root root 247, 26 Mar 31 17:25 s26
crw-rw-rw-  1 root root 247, 27 Mar 31 17:25 s27
crw-rw-rw-  1 root root 247, 28 Mar 31 17:25 s28
crw-rw-rw-  1 root root 247, 29 Mar 31 17:25 s29
crw-rw-rw-  1 root root 247,  3 Mar 31 17:25 s3
crw-rw-rw-  1 root root 247, 30 Mar 31 17:25 s30
crw-rw-rw-  1 root root 247, 31 Mar 31 17:25 s31
crw-rw-rw-  1 root root 247, 32 Mar 31 17:25 s32
crw-rw-rw-  1 root root 247, 33 Mar 31 17:25 s33
crw-rw-rw-  1 root root 247, 34 Mar 31 17:25 s34
crw-rw-rw-  1 root root 247, 35 Mar 31 17:25 s35
crw-rw-rw-  1 root root 247, 36 Mar 31 17:25 s36
crw-rw-rw-  1 root root 247, 37 Mar 31 17:25 s37
crw-rw-rw-  1 root root 247, 38 Mar 31 17:25 s38
crw-rw-rw-  1 root root 247, 39 Mar 31 17:25 s39
crw-rw-rw-  1 root root 247,  4 Mar 31 17:25 s4
crw-rw-rw-  1 root root 247, 40 Mar 31 17:25 s40
crw-rw-rw-  1 root root 247, 41 Mar 31 17:25 s41
crw-rw-rw-  1 root root 247, 42 Mar 31 17:25 s42
crw-rw-rw-  1 root root 247, 43 Mar 31 17:25 s43
crw-rw-rw-  1 root root 247, 44 Mar 31 17:25 s44
crw-rw-rw-  1 root root 247, 45 Mar 31 17:25 s45
crw-rw-rw-  1 root root 247, 46 Mar 31 17:25 s46
crw-rw-rw-  1 root root 247, 47 Mar 31 17:25 s47
crw-rw-rw-  1 root root 247, 48 Mar 31 17:25 s48
crw-rw-rw-  1 root root 247, 49 Mar 31 17:25 s49
crw-rw-rw-  1 root root 247,  5 Mar 31 17:25 s5
crw-rw-rw-  1 root root 247, 50 Mar 31 17:25 s50
crw-rw-rw-  1 root root 247, 51 Mar 31 17:25 s51
crw-rw-rw-  1 root root 247, 52 Mar 31 17:25 s52
crw-rw-rw-  1 root root 247, 53 Mar 31 17:25 s53
crw-rw-rw-  1 root root 247, 54 Mar 31 17:25 s54
crw-rw-rw-  1 root root 247, 55 Mar 31 17:25 s55
crw-rw-rw-  1 root root 247, 56 Mar 31 17:25 s56
crw-rw-rw-  1 root root 247, 57 Mar 31 17:25 s57
crw-rw-rw-  1 root root 247, 58 Mar 31 17:25 s58
crw-rw-rw-  1 root root 247, 59 Mar 31 17:25 s59
crw-rw-rw-  1 root root 247,  6 Mar 31 17:25 s6
crw-rw-rw-  1 root root 247, 60 Mar 31 17:25 s60
crw-rw-rw-  1 root root 247, 61 Mar 31 17:25 s61
crw-rw-rw-  1 root root 247, 62 Mar 31 17:25 s62
crw-rw-rw-  1 root root 247, 63 Mar 31 17:25 s63
crw-rw-rw-  1 root root 247,  7 Mar 31 17:25 s7
crw-rw-rw-  1 root root 247,  8 Mar 31 17:25 s8
crw-rw-rw-  1 root root 247,  9 Mar 31 17:25 s9

 

which look like they are mirroring the material in /dev/cpu#/ so perhaps it is providing access to the msr?   Can't say right now.  I have the source for PCM and can see where they open /dev/cpu#/msr   and that fails.  Perhaps I could open these files?

 

 

0 Kudos
Reply