Software Tuning, Performance Optimization & Platform Monitoring
Discussion around monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform monitoring

Number of available counters decreased with Linux kernel update.

spavel
Novice
335 Views
Hello,
 
We have upgraded the Linux kernel version on our machine and noticed that our monitoring system (that uses PAPI 5.6.0.0) is not able to collect all processor counters anymore with error "Event exists, but cannot be counted due to hardware resource limits".
Do you know what could cause that? Any help would be appreciated.

 

Enabling multiplexing helps to solve it, but counters sometimes report negative speed (e.g. PAPI_LD_INS reading without resetting: 100 -> 200 -> 300 -> 290 -> 400 -> 500 ...)
 
 
 
OLD kernel:
 
/usr/sbin/sysctl -w kernel.nmi_watchdog=0
./papi_event_chooser PRESET PAPI_LD_INS PAPI_SR_INS

Event Chooser: Available events which can be added with given events.
--------------------------------------------------------------------------------
PAPI version             : 5.6.0.0
Operating system         : Linux 3.10.0-229.el7.x86_64
Vendor string and code   : GenuineIntel (1, 0x1)
Model string and code    : Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz (63, 0x3f)
CPU revision             : 2.000000
CPUID                    : Family/Model/Stepping 6/63/2, 0x06/0x3f/0x02
CPU Max MHz              : 3600
CPU Min MHz              : 1200
Total cores              : 28
SMT threads per core     : 2
Cores per socket         : 14
Sockets                  : 1
Cores per NUMA region    : 28
NUMA regions             : 1
Running in a VM          : no
Number Hardware Counters : 11
Max Multiplex Counters   : 384
Fast counter read (rdpmc): no
--------------------------------------------------------------------------------
    Name        Code    Deriv Description (Note)
PAPI_L1_DCM  0x80000000  No   Level 1 data cache misses
PAPI_L1_ICM  0x80000001  No   Level 1 instruction cache misses
PAPI_L2_DCM  0x80000002  Yes  Level 2 data cache misses
...
PAPI_L3_TCW  0x80000060  No   Level 3 total cache writes
PAPI_REF_CYC 0x8000006b  No   Reference clock cycles
-------------------------------------------------------------------------
Total events reported: 54
 
 
NEW kernel:

Event Chooser: Available events which can be added with given events.
--------------------------------------------------------------------------------
PAPI version             : 5.6.0.0
Operating system         : Linux 3.10.0-957.1.3.el7.x86_64
Vendor string and code   : GenuineIntel (1, 0x1)
Model string and code    : Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz (63, 0x3f)
CPU revision             : 2.000000
CPUID                    : Family/Model/Stepping 6/63/2, 0x06/0x3f/0x02
CPU Max MHz              : 3600
CPU Min MHz              : 1200
Total cores              : 28
SMT threads per core     : 2
Cores per socket         : 14
Sockets                  : 1
Cores per NUMA region    : 28
NUMA regions             : 1
Running in a VM          : no
Number Hardware Counters : 11
Max Multiplex Counters   : 384
Fast counter read (rdpmc): no
--------------------------------------------------------------------------------

    Name        Code    Deriv Description (Note)
PAPI_TOT_INS 0x80000032  No   Instructions completed
PAPI_TOT_CYC 0x8000003b  No   Total cycles
PAPI_LST_INS 0x8000003c  Yes  Load/store instructions completed
PAPI_REF_CYC 0x8000006b  No   Reference clock cycles
-------------------------------------------------------------------------
Total events reported: 4
0 Kudos
1 Solution
spavel
Novice
301 Views

problem detected - it is an issue in old linux kernel with MEM_UOPS_RETIRED events.

solution - update or do not use that counter


https://elixir.bootlin.com/linux/v3.10-rc1/source/arch/x86/kernel/cpu/perf_event_intel.c#L131

View solution in original post

2 Replies
spavel
Novice
318 Views

Small update, I was able to reproduce the same issue with perf: old kernels is able to collect 3 counters without multiplexing, but on the new the multiplexing is required.

spavel
Novice
302 Views

problem detected - it is an issue in old linux kernel with MEM_UOPS_RETIRED events.

solution - update or do not use that counter


https://elixir.bootlin.com/linux/v3.10-rc1/source/arch/x86/kernel/cpu/perf_event_intel.c#L131

Reply