Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

RDPMC vs. RDMSR and counters for branch predictor

I used PAPI on Linux. But, I started to use Intel PCM 2.0 because it supports Windows. I have two questions. I am not an expert on x86 instructions.

(1) Source code of PCM 2.0 shows that it uses MSR register by using RDMSR instruction. I thought that RDPMC (Read Performance Monitoring Counters) was used as it seems to have smaller latency. Is there any reason why PCM uses RDMSR?

(2) Getting L2/L3 misses or hits are already implemented. But, I'm wondering how I can obtain other counters such as counters for branch predictors. In PAPI, I can switch the group of counters to be obtained. Is it possible to do that with PCM 2.0?

Thank you, MJ

0 Kudos
1 Reply
Hi MJ,
regarding (1): There are several considerations: RDPMC instruction might be an option for some of the counters accessed by Intel PCM, but not all are accessible through RDPMC (some uncore counters, energy monitors, etc). Also RDPMC is a privileged instruction therefore on some OSes it is easier to use existing kernel drivers (but they usually support only RDMSR instruction) instead of writing a new driver. Kernel driver access also involves an overhead that would reduce the latency savings gained through RDPMC.
regarding (2): Access to user defined counter can be achieved through the custom PCM "program" call:
[cpp]PCM::CustomCoreEventDescription MyEvents[4]; MyEvents[0].event_number = 0xc4; // architectural "branch instruction retired" event number MyEvents[0].umask_value = 0x00; // architectural "branch instruction retired" event umask MyEvents[1].event_number = 0xC5; // architectural "branch misses retired event" number MyEvents[1].umask_value = 0x00; // architectural "branch misses retired" event umask // add your own event ids here for on-core counter 2 and 3 MyEvents[2].event_number = ??; MyEvents[2].umask_value = ??; MyEvents[3].event_number = ??; MyEvents[3].umask_value = ??; if (PCM::getInstance()->program(PCM::CUSTOM_CORE_EVENTS,&MyEvents) != PCM::Success) return; // ... for system-wide PMU state monitoring (like in the pcm.x utility): SystemCounterState sstate1 = getSystemCounterState(); // run cour code that you want to measure // // SystemCounterState sstate2 = getSystemCounterState(); uint64 BRANCH_INSTR_RETIRED_events = getNumberOfCustomEvents(0,sstate1, sstate2); // read number of occurred events from counter 0 uint64 BRANCH_MISSES_RETIRED_events = getNumberOfCustomEvents(1,sstate1, sstate2); // read number of occurred events from counter 1 uint64 eventmetric2 = getNumberOfCustomEvents(2,sstate1, sstate2); // read number of occurred events from counter 2 uint64 eventmetric3 = getNumberOfCustomEvents(3,sstate1, sstate2); // read number of occurred events from counter 3[/cpp]
These events should work on processors based on Intel Core microarchitecture codenamed Nehalem and later. You can check available on-core PMU events for your processor in the "Intel 64 and IA-32 Architectures Software Developer's Manual. Volume 3B, System Programming Guide, Part 2" (Appendix A Performance Monitoring Events).
Best regards,
0 Kudos