It is important to understand the difference between exceptions and interrupts. These terms are sometimes used interchangeably, but it is helpful to keep them distinct. Chapter 6 of Volume 3 of the Intel Architectures Software Developer's Manual (document 325384) provides an extensive discussion of Exceptions and Interrupts in the context of the Intel architecture (which is more complex than the simple description I provide here).
If the performance monitoring interrupt was an "exception" of the "fault" class, there would be no problem with skid.
Implementing the performance monitoring interrupt in this way is not practical, however. The performance monitoring unit aggregates data from all of the functional units of the core. The latency of sending data to the performance monitoring unit will vary by unit and may be several cycles for the more distant units. So an interrupt is used instead of an exception, and this breaks the precise mapping back to instructions. There are cases where the skid can be reduced, but to eliminate it completely the performance monitoring unit would have to be replicated in every processor functional unit so that a "fault exception" could be generated instead. I don't know of any design that has taken this approach....
I think PMC is incremented in pipeline, and somehow PMU will check it (don't know is it callback-liked or periodically check),
And if it is overflow, PMU then issue a external interrupt by apic.
but meanwhile, the pipeline is still processing the remaining instruction and moving forward even the interrupt is fired (processor's pipeline doesn't case about that)
so that processors cannot predict or guarantee how far does interrupt moved, that caused the differences between the instruction of an interrupt arrivals and the instruction of PMC overflowing is becoming greatly vary.
There is some curious and some question with some personal thought and assumption:
(1) if above described assumption is basically correct, how does the out-of-order execution affect this phenomeon?
(2) Is it becaouse of OoOE lets processors can execute more than one instruction in pipeline at any given time. And the instruction that caused overflow and other instructions are highly possibly executing at the same time, so that the original overflow instruction may not have enough time to wait a PMI before it retire and moving forward. (racing)
For in-order execution, an instruction execution have to wait a previous instruction retired, so that an instruction is always have an enough time to receive PMI before it moves to next instruction forward.
(3) Does in-order execution architecture eliminates skid or just greatly mitigates skid ?
The implementation of the Performance Monitoring Unit (PMU) can't be "tightly coupled" to the functional unit pipelines for many reasons:
These issues apply to both in-order and out-of-order processors. The issues are more complex in out-of-order processors because they are typically physically larger (more cycles of latency). Attempting to compensate for the delay is more complex because it is not (in general) possible to know what was actually executed between the instruction that caused the final increment to the performance counter and the time that the PMU was able to signal the interrupt.
There are limited cases for which the designers have worked very hard to make skid predictable, so it can be exactly compensated. For the Sandy Bridge through Skylake cores, the "PEBS-PDIR" feature (discussed in Chapter 18 of Volume 3 of the Intel Architectures SW Developer's Guide, document 325384) allows known skid for exactly one event (INST_RETIRED.PREC_DIST) when using PMC1 only (and disabling counting on the other PMCs). There are lots of other related topics in Chapter 18....