I have a question about thermal throttling. I use an Intel(R) Xeon(R) CPU E5-2680 v3 processor.
In my syslog, I have this line :
1498883997 2017 Jul 1 06:39:57 node_id kern crit kernel CPU42: Package temperature above threshold, cpu clock throttled (total events = 1174)
I found information about thermal throttling events in Linux kernel at https://elixir.bootlin.com/linux/latest/source/arch/x86/kernel/cpu/mcheck/therm_throt.c and https://elixir.bootlin.com/linux/latest/source/drivers/thermal/x86_pkg_temp_thermal.c
If I understand correctly, the throttle count (corresponding to "total events" in syslog) is incremented at each interrupt. More precisely, it is incremented when a interrupt is raised on a transition from a temperature below the trip point (TM2) to above (based on description in https://www.intel.com/content/www/us/en/embedded/testing-and-validation/cpu-monitoring-dts-peci-pape... Is this correct ?
My second question is : Is there a way to compute the throttling time based on counter (corresponding to "total events" in syslog) ? (Eg : My CPU throttled during 10 seconds in 5 previous minutes).
For the Xeon E5 v3 there are performance counter events in the Power Control Unit (PCU) of the uncore that can be used to measure cumulative time spent in various throttling conditions. These are described in the Xeon E5 v3 uncore performance monitoring reference manual (document 331051), Section 2.8.
Events that are probably important are FREQ_MAX_LIMIT_THERMAL_CYCLES, PROCHOT_INTERNAL_CYCLES, PROCHOT_EXTERNAL_CYCLES.
I don't currently have any systems that can be put into thermal throttling in a repeatable way, so I have not tested most of these events.