Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Counting CPU cycles of a process with PMC0 counter



For educational purposes, I've written a linux kernel module that reads the number of unhalted CPU cycles used by a process using the Intel PMU. I'm then comparing the value returned by the kernel module with the value returned by `perf stat -e cycles ./test`. For some reason, both the values are vastly different. 

The outline of the module I've written is as follows:

1. Register an interrupt handler that is called whenever a counter register overflows.

2. Clear (write 0x0) the following MSRs on every CPU:
    - IA32_FIXED_CTR0
    - IA32_PMC0

3. Program IA32_PERFEVTSEL0 on every CPU to count UNHALTED_CPU_CYCLES (0x3C in Intel Core i5). Also, set bit 20 for enabling interrupts, set bit 22 to enable counters, set bit 16 and 17 to count in both user and kernel mode.

4. Write -999 to IA32_PMC0 counter.

5. I've also added hooks in the linux kernel code such that whenever a given process's context is switched in/out of CPU, the hook function is called. Hence, when a process P is scheduled for CPU, I'll start the counters. When process P is switched out of the CPU, I'll stop the counters. 

6. At this moment, whenever our process P is in running state, the counters will start counting the CPU cycles. And whenever our process is scheduled out, counters will temporarily stop. The counters will permanently stop and print results when the process terminates. Also, whenever there is a counter overflow, the interrupt handler will be called. The handler is simply adding the value of the counter to a global sum variable. It will also clear the CLR_OVF_PMC0 bit in IA32_PERF_GLOBAL_OVF_CTRL and the PMC0 counter. With this setup I believe I should be able to read the number of cycles taken by a process. However, the number my module gives for a process and the number the command `perf stat -d cycles` gives for the same process has a huge difference. For example, my module gives around 56,000 cycles for a small C program, and perf gives around 700,000. 

Can anyone tell me what could be going wrong in my approach? The actual C program can be seen at



0 Kudos
0 Replies