Does the PAUSE instruction delay handling interrupts?

HadiBrais · ‎12-07-2019

It's not clear from the publicly available information how the PAUSE exactly works. As far as I can tell, it essentially suspends the whole pipeline including instruction fetch for the logical core that executes the instruction. Execution is resumed after a microarchitecture-specific latency. I'm particularly interested in knowing whether PAUSE delays handling interrupts. That is, when an interrupt occurs while PAUSE is being executed (i.e., the pipeline is suspended), does the core immediately complete the execution of PAUSE to handle the interrupt or does it wait until the full latency of PAUSE elapses? Interrupts are delayed in the second case. Is the behavior the same for all types of interrupts (maskable, unmaskable) and on all microarchitectures? Is the behavior the same when all sibling logical cores execute PAUSE? Ideally, PAUSE should never delay handling interrupts (I can't think of a good reason why it would).

McCalpinJohn · ‎12-08-2019

On most implementations the PAUSE instruction has a short enough latency that I suspect it would be difficult to measure the difference....

Agner Fog's https://www.agner.org/optimize/instruction_tables.pdf show SKX with a 141 cycle repeat rate. I get an almost identical (average) value of 565 cycles for 4 PAUSE instructions in a loop. That is still small compared to most interrupt handlers, but is probably big enough to be measurable?

HadiBrais · ‎12-08-2019

Right, the optimization manual mentions that the latency of PAUSE on Skylake "has been extended to as many as 140 cycles," which I think gives us an approximate upper limit on the delay for interrupt handling if PAUSE does delay interrupts. Although we can determine the cycle at which the interrupt handler begins execution, we don't know the cycle with reasonable accuracy at the which the interrupt "occurs," so it's difficult to answer the question experimentally by measuring the delay in handling it.

McCalpinJohn · ‎12-09-2019

I wonder if you could get a very tight bound on when the interrupt occurs by using a performance monitoring interrupt? With a user-mode RDPMC instruction, you should be able to get the current count of the instructions retired counter, for example, then use that value to set a loop bound for a spin loop that will exit (into a PAUSE instruction) just before or after the counter overflows.... Maybe?