Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Serious Problem about PMU

Kelvin_C_
Beginner
574 Views

I supposed the PMI will be issued after STI instruction

But it seems not "always" be interrupted at the same address ,

for example , i set up the corresponding MSRs for monitoring FAR Branch with Ring 0 privilege level.

It supposed interrupt in the following instruction (0x0000000014006EA98) after Ring 3 issue a syscall ,

but the fact is that, it will be interrupted after STI , but not actually at specific instruction, such as 0x0000000014006EAAD, what is the problem ?

I supposed after the STI is enable the interruption, then CPU will be interrupted by PMI immediately, isn't it?  

$A$797FTW4(HLZH3)1{$)1Q.png

 

STI will be delayed? or what is the problem?

Please tell me if you know ;)

0 Kudos
14 Replies
McCalpinJohn
Honored Contributor III
574 Views

In an out-of-order processor, there is almost always a delay between the event that causes an interrupt and the handling of the interrupt.  This causes the interrupt handler to "see" a program counter that is after the program counter of the instruction that caused the interrupt.

The phenomenon is usually called "skid", and you can find several discussions of the topic in Chapter 18 of Volume 3 of the Intel Architectures Software Developer's Manual (document 325384).  Some of the performance counter events have been enhanced to provide additional data and to reduce "skid".  These events, the processors that introduced them, and limitations on their use are all discussed in Chapter 18.
 

0 Kudos
Kelvin_C_
Beginner
574 Views

Thank you for answering this question.

More question , So the "skid" is not improvable by software , is that right??

And one more phenomenon i found that, is if I make a INT 3 interrupt between each Syscall, the skid will be relatively reduced , almost immediately interrupt after STI instruction , what is reason about this phenomenon??

For example  :

for( i = 0 ; i < 1000000 ; i++)
{
   SYSCALL...

   INT 3

}

Kelvin.

 

0 Kudos
McCalpinJohn
Honored Contributor III
574 Views

The delayed operation is a feature of the STI instruction.  Read about it in Volume 2 of the Intel Architectures SW Developer's Manual (document 325383).

0 Kudos
Kelvin_C_
Beginner
574 Views

You mean the root cause of the "skid"  is due to STI instruction delayed?

For my understanding , STI will "immediately" enable interrupt , 

Therefore,  

(1) Is there a anyway for solving a problem of  the skid of FAR BRANCH 

(2) Why INT 3 could make the next Syscall very occurate.??

(3) Will STI not immediately enable interrupt , isn't it?

 

Very appreciate for answering question , John.

0 Kudos
McCalpinJohn
Honored Contributor III
574 Views

Read the instruction description in Volume 2 of the SW Developer's manual.

The description says that interrupts will be enabled after the instruction following the STI instruction.  That is exactly what you are seeing.

0 Kudos
Kelvin_C_
Beginner
574 Views

Yes, I know STI will be delay one instruction. But the phenonomeon i noticed that is, the interrupt maybe placed after more instruction , it should be you mentioned "skid" , is it no solution for skid ?

0 Kudos
McCalpinJohn
Honored Contributor III
574 Views

There is no general solution for skid in out-of-order processors.

According to the discussions in Chapter 18 of Volume 3 of the Intel Architectures Software Developers Manual, recent Intel processors support an enhancement to Processor Event-Based Sampling (PEBS) called Precise Distribution of Instructions Retired (PDIR).  This applies only to the "INST_RETIRED.ALL" performance counter event.  It has several additional limitations as well, as discussed in Chapter 18.

0 Kudos
Kelvin_C_
Beginner
574 Views

Thanks a lot , John , You answer is really helpful:)

I should be going to cover as wide as possible for different RIP which maybe interrupted ;((

maybe it is only things what can I do to get over the "SKID"

0 Kudos
Kelvin_C_
Beginner
574 Views

But John, there is other phenomenon that I cannot explain.

I have found that is, if make a software breakpoint after every syscall , and the "skid" will be extremely reduced ,

do you have any idea??

0 Kudos
McCalpinJohn
Honored Contributor III
574 Views

If something reduces skid, it probably does so by decreasing the ability of the processor to execute instructions out of order.  

The single-byte form of the "INT 3" instruction (opcode 0xCC) is a special (simplified) case of the more general INT instruction, but even the simple case has fairly complex behavior -- see the discussion of the INT instruction in Volume 2 of the Intel Architectures Software Developers Manual.  This complex behavior probably means that the instruction is microcoded and takes a number of cycles to complete.  This seems likely to make it hard for the processor to do enough out-of-order processing to move the program counter very far, so the skid will be reduced.

These are just guesses -- I don't know a lot about how interrupts are implemented on Intel processors.

0 Kudos
Kelvin_C_
Beginner
574 Views

Thank you for your answering, John.

But why is that Out-of-Order Execution will cause a PMI delay?? 

Retirement Unit is supposed to make sure the consistency with Original Instruction order.

And i supposed after the syscall PMC0 will incremented by 1 and overflow (assume it set to be -1),

And after the "STI" instruction, the first instruction retired,  PMI is issued

But the fact tell me it is wrong guess, but I'm not sure why is that?

0 Kudos
McCalpinJohn
Honored Contributor III
574 Views

It is less an issue of out-of-order processing than it is of propagation delays across the chip.  The performance monitoring interrupt must come from the performance monitoring unit, which can't be "close to" all of the other functional units in the core.

A lot of work goes into making sure that "exceptions" are handled precisely.  An exception is raised by the functional unit that is executing the instruction, while the instruction is still in the pipeline, so there is no ambiguity about which instruction to point to.

An "interrupt" is not raised by the unit executing the instruction.  Interrupts are typically completely asynchronous, or in this case the interrupt is generated by a different functional unit than the unit that executed the instruction that generated the interrupt.   The PMU only knows that it is generating an interrupt on the overflow of a counter -- it does not have any knowledge of which functional unit executed the instruction that caused the overflow to happen. 

0 Kudos
Kelvin_C_
Beginner
574 Views

Oh , understood , very thanks for you briefly explanation , John.

0 Kudos
Kelvin_C_
Beginner
574 Views

7G[G1`4W$YAQUVML@YM2`39.png

Hi John, 

I'm keeping on the research for explaining why INT 3 will almost totally reduced the skid and I recently found out this in Intel SDM, do you think it is related to the scene ?? 

I assume if the instruction stream exist "INT" instruction it will be forced in-order execution (just assume) , but i can't explain that why it will be able to reduce the skid , even the instruction is in-order executed , any brain-storming?

 

Pseudo Instruction Stream:

System func:

SYSCALL

ret

-------------------------------------------

CALL System Func

INT 3

 

0 Kudos
Reply