There are different ways to measure L3 cache misses and they vary with architecture. Therefore, let's focus on the cycles first. Are you measuring cycles on 1 core or on the complete CPU? PCM reports the cycles including turbo mode (in contrast to "reference cycles"). Is PAPI doing the same?
What kind of CPU are you using and how many threads is your workload using?
The counter does not advance in the following conditions:
- an ACPI C-state is other than C0 for normal operation
- STPCLK+ pin is asserted
- being throttled by TM1
- during the frequency switching phase of a performance state transition
The getRefCycles() function returns the CPU_CLK_UNHALTED.REF event countwhichisthe number of reference clock cycles while clock signal on the core is running. The reference clock operates at a fixed frequency, irrespective of core frequency changes due to performance state transitions. Note that CPU_CLK_UNHALTED.THREAD can exceed the CPU_CLK_UNHALTED.REF event count if Turbo Boost kicks in.
one can find documentation for the PCM methods in Doxygen format in the cpucounters.h header. HTML documentation can be easily generated from it (the included doxygen project file iscalled "Doxyfile").