- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm working on a Unix-like x86 operating system and I need to measure the performance of some benchmarks in the system. It turns out that so far the OS does not have any tool to access hardware counters, so the only option I have is to access them directly. On Linux I have already used tools like PAPI and perf but I don't know internally how they work and I am pretty lost in this regard.
Please correct me if I am wrong:
• For fixed-function counters, only the rdpmc instruction is enough and I just need to enable the 30bit plus the counter number in ECX, but here comes my first question, where in the Intel Developer's Manual I can find the relation of the counter number to use In ECX? I've seen something similar in this post (How to read performance counters by rdpmc instruction?) but I would like to access all the counters (0-7, since I have a Sandy Bridge, i7 2600) and know what each one performs.
• For programmable counters, so far as I know, these are divided into architectural and non-architectural, the latter is specific to an architecture and the former can be supported by various architectures as long as it's informed via CPUID.
This way, I need to configure the IA32_PERFEVTSELx and set the Event Select and Unit Mask fields. In Volume 3, Table 18-1, I have a list of predefined events that I can use, I believe that for my processor, I can also use the data in Table 19-3.
If I use the 'LLC Misses' event for instance, UMask = 41H / Event Select = 2EH. What counter number in ECX should I use in the rdpmc instruction?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is a 1:1 correspondence between the IA32_PERFEVTSEL
Counter, Control MSR, Count MSR
0, 0x186, 0xC1
1, 0x187, 0xC2
2, 0x188, 0xC3
etc....
The trick with setting bit 30 of ECX to access the fixed function counters is probably confusing if you learn about that before you learn the "normal" way of accessing the counters (with ECX set to the programmable counter number).
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are only interested in counting (in contrast to sampling), PCM might be an alternative for you: https://software.intel.com/en-us/articles/intel-performance-counter-monitor
Since PCM uses a MSR device driver, it might be not too difficult to port it to your OS.
But even if you don't want to attempt a port, the code might still be useful as reference. For example, the PMUs are configured here: https://github.com/opcm/pcm/blob/master/cpucounters.cpp#L1747
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are running in kernel space (e.g., in a loadable kernel module), then you just need to go through the list of MSRs described in Chapter 18 of Volume 3 of the Intel Architectures SW Developer's Manual to enable and program the counters. I recommend starting with Section 18.2 on Architectural Performance Monitoring as an overview before you get into the processor-specific details in Section 18.9.
If you are running in user space, you need an interface to the kernel that enables you to read/write these MSRs. On Linux systems this is the /dev/cpu/*/msr device driver interface, but the functionality can also be provided by other kernel functions. If you have such an interface, it should be relatively easy to port PCM or Likwid (https://github.com/RRZE-HPC/likwid). If you are running in user mode and don't have an interface to the MSRs, then you will probably need to build your own kernel module (assuming that is supported).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Thomas and John,
Well, I have no interface in the kernel to access the MSRs. Although it is possible to write it, it would take me some time since besides the driver port I also need to make the tool port as you mentioned.
I have easy access to kernel space and if so I can also write a small interface for user space, my biggest problem is in the basic understanding of the counters.
What I want to do is something like this: http://stackoverflow.com/questions/22421227/how-many-cache-misses-will-we-have-for-this-simple-program#answer-22421432
What I can not understand is: when I set up an event, on what counter can I retrieve the value of it? to use in rdpmc.
I am looking for this information in Chapters 18 and 19 in the Intel Manual and I cannot find it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is a 1:1 correspondence between the IA32_PERFEVTSEL
Counter, Control MSR, Count MSR
0, 0x186, 0xC1
1, 0x187, 0xC2
2, 0x188, 0xC3
etc....
The trick with setting bit 30 of ECX to access the fixed function counters is probably confusing if you learn about that before you learn the "normal" way of accessing the counters (with ECX set to the programmable counter number).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
CyrIng wrote:
For fixed counters, you can browse my driver source code which handles architectures from "old" Core up to recent i7
https://github.com/cyring/CoreFreq/blob/master/corefreqk.c
That's a nice code and project, very well written and understandable, certainly I will check this out later.
Mccalpin, John wrote:
There is a 1:1 correspondence between the IA32_PERFEVTSEL
MSRs that control the programmable counters and the IA32_PMC MSRs that contain the counts. The here is the counter number (0-1, 0-3 or 0-7, depending on the processor and whether HyperThreading is enabled), and this same is the value placed in the ECX register prior to executing the RDPMC instruction. Counter, Control MSR, Count MSR
0, 0x186, 0xC1
1, 0x187, 0xC2
2, 0x188, 0xC3
etc....
The trick with setting bit 30 of ECX to access the fixed function counters is probably confusing if you learn about that before you learn the "normal" way of accessing the counters (with ECX set to the programmable counter number).
That's exactly what I was looking for. Certainly my first mistake was learn fixed-counters before the 'normal' way. Now the things makes sense, this also means that I can read/write values using the MSRs directly instead of using rdpmc, since in ring 0.
Thank you all!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, in ring 0 you can read the core performance counters either using the RDPMC instruction or the RDMSR instruction. I don't know if there is a performance difference between the two approaches.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You might also like to take a look at https://github.com/obilaniu/libpfc which is a small kernel module and userspace library for just donig what you want: reading performance counters.
It works out of the box, but it would be a great target for porting to another architecture, since most of the details are about handling the x86-specific stuff to enable/read the counters.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Travis D. wrote:
You might also like to take a look at https://github.com/obilaniu/libpfc which is a small kernel module and userspace library for just donig what you want: reading performance counters.
It works out of the box, but it would be a great target for porting to another architecture, since most of the details are about handling the x86-specific stuff to enable/read the counters.
Hello Travis D,
I already knew this repository, in fact it's very interesting and simple to understand. I believe it's not difficult to port it to other systems and It would be interesting if the author made it more robust since it's assuming a lot of things, even so, it's a great way to understand PMC's.
Currently I am comparing this kernel module with my implementation since I am having some issues regards the event 'LLC Misses', that I always get the value 0, no matter what benchmark I use, in the other events I get correct values.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Davidson F. wrote:
Hello Travis D,
I already knew this repository, in fact it's very interesting and simple to understand. I believe it's not difficult to port it to other systems and It would be interesting if the author made it more robust since it's assuming a lot of things, even so, it's a great way to understand PMC's.
Currently I am comparing this kernel module with my implementation since I am having some issues regards the event 'LLC Misses', that I always get the value 0, no matter what benchmark I use, in the other events I get correct values.
If you have any changes that can make the library more robust, I'm sure the author would be happy to pull them. It is, after all, free and open source software.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just to update, I have not managed to find why LLC Misses does not work in my processor, but, in other processors works fine, so I just used another processor.
Travis D. wrote:
If you have any changes that can make the library more robust, I'm sure the author would be happy to pull them. It is, after all, free and open source software.
Regarding libpfc, I have not made changes on this lib yet but when I do for sure I will send a PR for the author =).

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page