Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Reading programmable and fixed-function performance counters

Davidson_F_
Beginner
1,669 Views

I'm working on a Unix-like x86 operating system and I need to measure the performance of some benchmarks in the system. It turns out that so far the OS does not have any tool to access hardware counters, so the only option I have is to access them directly. On Linux I have already used tools like PAPI and perf but I don't know internally how they work and I am pretty lost in this regard.

Please correct me if I am wrong:

 For fixed-function counters, only the rdpmc instruction is enough and I just need to enable the 30bit plus the counter number in ECX, but here comes my first question, where in the Intel Developer's Manual I can find the relation of the counter number to use In ECX? I've seen something similar in this post (How to read performance counters by rdpmc instruction?) but I would like to access all the counters (0-7, since I have a Sandy Bridge, i7 2600) and know what each one performs.

• For programmable counters, so far as I know, these are divided into architectural and non-architectural, the latter is specific to an architecture and the former can be supported by various architectures as long as it's informed via CPUID.

This way, I need to configure the IA32_PERFEVTSELx and set the Event Select and Unit Mask fields. In Volume 3, Table 18-1, I have a list of predefined events that I can use, I believe that for my processor, I can also use the data in Table 19-3.

If I use the 'LLC Misses' event for instance, UMask = 41H / Event Select = 2EH. What counter number in ECX should I use in the rdpmc instruction?

0 Kudos
1 Solution
McCalpinJohn
Honored Contributor III
1,669 Views

There is a 1:1 correspondence between the IA32_PERFEVTSEL MSRs that control the programmable counters and the IA32_PMC MSRs that contain the counts.  The here is the counter number (0-1, 0-3 or 0-7, depending on the processor and whether HyperThreading is enabled), and this same is the value placed in the ECX register prior to executing the RDPMC instruction.

Counter, Control MSR, Count MSR

0, 0x186, 0xC1

1, 0x187, 0xC2

2, 0x188, 0xC3

etc....

The trick with setting bit 30 of ECX to access the fixed function counters is probably confusing if you learn about that before you learn the "normal" way of accessing the counters (with ECX set to the programmable counter number).

View solution in original post

0 Kudos
11 Replies
Thomas_W_Intel
Employee
1,669 Views

If you are only interested in counting (in contrast to sampling), PCM might be an alternative for you: https://software.intel.com/en-us/articles/intel-performance-counter-monitor

Since PCM uses a MSR device driver, it might be not too difficult to port it to your OS.

But even if you don't want to attempt a port, the code might still be useful as reference. For example, the PMUs are configured here: https://github.com/opcm/pcm/blob/master/cpucounters.cpp#L1747

 

0 Kudos
McCalpinJohn
Honored Contributor III
1,669 Views

If you are running in kernel space (e.g., in a loadable kernel module), then you just need to go through the list of MSRs described in Chapter 18 of Volume 3 of the Intel Architectures SW Developer's Manual to enable and program the counters.   I recommend starting with Section 18.2 on Architectural Performance Monitoring as an overview before you get into the processor-specific details in Section 18.9.

If you are running in user space, you need an interface to the kernel that enables you to read/write these MSRs.  On Linux systems this is the /dev/cpu/*/msr device driver interface, but the functionality can also be provided by other kernel functions.    If you have such an interface, it should be relatively easy to port PCM or Likwid (https://github.com/RRZE-HPC/likwid).   If you are running in user mode and don't have an interface to the MSRs, then you will probably need to build your own kernel module (assuming that is supported).

0 Kudos
Davidson_F_
Beginner
1,669 Views

Thank you Thomas and John,

Well, I have no interface in the kernel to access the MSRs. Although it is possible to write it, it would take me some time since besides the driver port I also need to make the tool port as you mentioned.

I have easy access to kernel space and if so I can also write a small interface for user space, my biggest problem is in the basic understanding of the counters.

What I want to do is something like this: http://stackoverflow.com/questions/22421227/how-many-cache-misses-will-we-have-for-this-simple-program#answer-22421432

What I can not understand is: when I set up an event, on what counter can I retrieve the value of it? to use in rdpmc.

I am looking for this information in Chapters 18 and 19 in the Intel Manual and I cannot find it.

0 Kudos
CyrIng
Novice
1,669 Views
For fixed counters, you can browse my driver source code which handles architectures from "old" Core up to recent i7 https://github.com/cyring/CoreFreq/blob/master/corefreqk.c
0 Kudos
McCalpinJohn
Honored Contributor III
1,670 Views

There is a 1:1 correspondence between the IA32_PERFEVTSEL MSRs that control the programmable counters and the IA32_PMC MSRs that contain the counts.  The here is the counter number (0-1, 0-3 or 0-7, depending on the processor and whether HyperThreading is enabled), and this same is the value placed in the ECX register prior to executing the RDPMC instruction.

Counter, Control MSR, Count MSR

0, 0x186, 0xC1

1, 0x187, 0xC2

2, 0x188, 0xC3

etc....

The trick with setting bit 30 of ECX to access the fixed function counters is probably confusing if you learn about that before you learn the "normal" way of accessing the counters (with ECX set to the programmable counter number).

0 Kudos
Davidson_F_
Beginner
1,669 Views

CyrIng wrote:

For fixed counters, you can browse my driver source code which handles architectures from "old" Core up to recent i7
https://github.com/cyring/CoreFreq/blob/master/corefreqk.c

That's a nice code and project, very well written and understandable, certainly I will check this out later.

Mccalpin, John wrote:

There is a 1:1 correspondence between the IA32_PERFEVTSEL MSRs that control the programmable counters and the IA32_PMC MSRs that contain the counts.  The here is the counter number (0-1, 0-3 or 0-7, depending on the processor and whether HyperThreading is enabled), and this same is the value placed in the ECX register prior to executing the RDPMC instruction.

Counter, Control MSR, Count MSR
0, 0x186, 0xC1
1, 0x187, 0xC2
2, 0x188, 0xC3
etc....
The trick with setting bit 30 of ECX to access the fixed function counters is probably confusing if you learn about that before you learn the "normal" way of accessing the counters (with ECX set to the programmable counter number).

That's exactly what I was looking for. Certainly my first mistake was learn fixed-counters before the 'normal' way. Now the things makes sense, this also means that I can read/write values using the MSRs directly instead of using rdpmc, since in ring 0.

Thank you all!

0 Kudos
McCalpinJohn
Honored Contributor III
1,669 Views

Yes, in ring 0 you can read the core performance counters either using the RDPMC instruction or the RDMSR instruction.   I don't know if there is a performance difference between the two approaches.

0 Kudos
Travis_D_
New Contributor II
1,669 Views

You might also like to take a look at https://github.com/obilaniu/libpfc which is a small kernel module and userspace library for just donig what you want: reading performance counters.

It works out of the box, but it would be a great target for porting to another architecture, since most of the details are about handling the x86-specific stuff to enable/read the counters.

0 Kudos
Davidson_F_
Beginner
1,669 Views

Travis D. wrote:

You might also like to take a look at https://github.com/obilaniu/libpfc which is a small kernel module and userspace library for just donig what you want: reading performance counters.

It works out of the box, but it would be a great target for porting to another architecture, since most of the details are about handling the x86-specific stuff to enable/read the counters.

Hello Travis D,

I already knew this repository, in fact it's very interesting and simple to understand. I believe it's not difficult to port it to other systems and It would be interesting if the author made it more robust since it's assuming a lot of things, even so, it's a great way to understand PMC's.

Currently I am comparing this kernel module with my implementation since I am having some issues regards the event 'LLC Misses', that I always get the value 0, no matter what benchmark I use, in the other events I get correct values.

0 Kudos
Travis_D_
New Contributor II
1,669 Views

Davidson F. wrote:

Hello Travis D,

I already knew this repository, in fact it's very interesting and simple to understand. I believe it's not difficult to port it to other systems and It would be interesting if the author made it more robust since it's assuming a lot of things, even so, it's a great way to understand PMC's.

Currently I am comparing this kernel module with my implementation since I am having some issues regards the event 'LLC Misses', that I always get the value 0, no matter what benchmark I use, in the other events I get correct values.

If you have any changes that can make the library more robust, I'm sure the author would be happy to pull them. It is, after all, free and open source software.

0 Kudos
Davidson_F_
Beginner
1,669 Views

Just to update, I have not managed to find why LLC Misses does not work in my processor, but, in other processors works fine, so I just used another processor.

Travis D. wrote:

If you have any changes that can make the library more robust, I'm sure the author would be happy to pull them. It is, after all, free and open source software.

Regarding libpfc, I have not made changes on this lib yet but when I do for sure I will send a PR for the author =).

0 Kudos
Reply