Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Setting overflow flags in IA32_PERF_GLOBAL_STATUS register



I need to set an overflow flag (e.g. bit #33) in IA32_PERF_GLOBAL_STATUS MSR, which is read-only.

On Skylake (which supports architectural performance monitoring version 4) it's easy, because there is IA32_PERF_GLOBAL_STATUS_SET MSR available for this purpose. Unfortunately I have to do it on older processors. So I did the following hack (here is some pseudo-asm).

Initial condition:

  • IA32_FIXED_CTR_CTRL contains 0xB0 (enable IA32_FIXED_CTR1 with interrupt on overflow).
  • Performance Counter Register in APIC Local Vector Table (0x340) is not masked, fixed delivery mode.

Setting bit #33 in IA32_PERF_GLOBAL_STATUS:

L1:  WRMSR IA32_PERF_GLOBAL_CTRL, 0     // Stop all counters
L2:  CLI                                // Disable interrupts
L3:  RDMSR saved_ctr1, IA32_FIXED_CTR1  // Save current value of the counter
L4:  WRMSR IA32_FIXED_CTR1, 0xFFFFFFFFFFFF // Write the maximum supported value
L5:  WRMSR IA32_FIXED_CTR_CTRL, 0x30    // Enable CTR1 *without* interrupt on overflow
L6:  WRMSR IA32_PERF_GLOBAL_CTRL, 0x200000000 // Enable CTR1
L7:  // CTR1 overflows here and STATUS gets the desired value
L8:  WRMSR IA32_PERF_GLOBAL_CTRL, 0     // Stop CTR1
L9:  WRMSR IA32_FIXED_CTR_CTRL, 0xB0    // Restore original value
L10: WRMSR IA32_FIXED_CTR1, saved_ctr1  // Restore original value
L11: STI                                // Enable interrupts

This code works quite well. However, sometimes (roughly in 1% of cases) it produces the "interrupt on counter overflow" when interrupts are enabled back at L11.

I tried to read from APIC LVT 0x340 after L2 and L10: when this spurious interrupt happens, the interrupt mask becomes 1, it means that the interrupt is really caused by CTR1 overflow between L6 and L8, but "interrupt on overflow" is disabled at L5! Why this might happen?

Maybe writes to IA32_FIXED_CTR_CTRL and IA32_PERF_GLOBAL_CTRL are too close to each other and the processor doesn't have enough cycles to disable the interrupt before starting the counter? I tried to add 5ms delays between L5-L6 and between L8-L9, and it solved the issue - no interrupt happens. However, if I lower one of them to 4ms, interrupts come back.



  -- Ilya

0 Kudos
0 Replies