Is it true that page faults may lead to extra instruction count when using INST_RETIRED.ANY?

Tokponnon__Parfait · ‎11-13-2016

I run exactly the same portion of user code twice and I noticed the two numbers do not match. The second count is always lesser than the first.

Some said on the web that each exception or interrupt count as an extra instruction but it is not mentionned anywhere in the intel software development manuel.

is this true?

This is my config:

set MSR_PERF_GLOBAL_CTRL = 1<<32
set MSR_PERF_FIXED_CTRL  = 0x2
CPU core i5

McCalpinJohn · ‎11-18-2016

My interpretation of the information in the SW developer's manual is that interrupts and exceptions should generate some instructions in user mode before the transition to kernel mode. (But I could certainly be wrong....)

My interpretation is based on combining information from two places in the SWDM:

Section 18.2 of Volume 3 of the Intel Architectures SW Developer's Manual (325384-059) says that the OS flag (bit 17 of each performance counter event select register) "specifies that the selected microarchitectural condition is counted when the logical processor is operating at privilege level 0."
See the general discussion of interrupt/exception handling in Section 6.4 of Volume 1 of the Intel Architectures SW Developer's Manual (253665-059). Changes in privilege levels are discussed in Section 6.3.6, where execution at the new privilege level does not begin until step 8.

Additional information on protection and privilege levels is available in Chapter 5 of Volume 3 of the Intel Architectures SW Developer's Manual (325384-059).

There is also a lot of activity associated with starting up a process, including loading dynamically-linked libraries, that may have been pushed into user space. It is easy to imagine different user-space code paths being executed when these libraries are cached in memory vs the first execution when they are (presumably) not cached in memory. (I would expect most of the extra code to be in the kernel, but the amount in user space may not be negligible.)

For repeated executions of the same binary I would expect the number of data page faults to be the same (since the TLBs are typically flushed between processes), but there is a lot of complex software infrastructure there that very few people understand (and I am certainly not one of those people).

For repeated executions of the same binary the text page faults required for the first execution will typically go all the way to disk, while the text page faults for the second execution should find the text pages in the filesystem cache. Again, many number of layers of software are involved, and it would take a fairly significant effort to understand the interactions of repeated invocations of these software layers with the HW.

Tokponnon__Parfait · ‎11-19-2016

Thank you John for your explanation,
I agree with you that some library may be called before starting an actual user process;
However, as far as I know, it is up to the kernel to resort to those libraries according to the OS design.
So When I configure my counter in my kernel code, I can control which user path is taken after every interrupt (or exception) so that, as a system programmer, I must be able to count exactly how many instructions are executed in user mode.
Or am I wrong?

McCalpinJohn · ‎11-20-2016

Nothing is as simple as it looks....

If you consistently see different higher instruction counts for the first execution of a code (either a full code or a specific section of code), then you have demonstrated that the behavior is different. Figuring out why depends on a huge number of details that can't be put in a simple list -- you have to start with a clear statement of exactly what code is being executed, identifying exactly what kernel code may need to be executed in support of the user code, etc....