- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Everyone,
I am exploring Thermal-related MSRs at Intel Xeon 5318H, 3rd-generation processors.
Can anybody explain the Intel Running Average Thermal Limit (RATL) mechanism? What it does and how we can use it? Is there any MSR register for thermal capping? I got info on RATL from "Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 4: Model-Specific Registers (https://cdrdv2.intel.com/v1/dl/getContent/671098)".
I also got to know about two MSRs related to RATL mentioned below:
1. MSR_TEMPERATURE_TARGET [addr: 0x1A2] : Value=0x610A00 at my system
bit[0:6] : TCC Offset Time Window. [set to zero at my system]
bit[7] : TCC Offset Clamping Bit, When enabled, will allow RATL throttling below P1. What is P1, Here? [set to zero at my system]
bit[8:15]: Temperature Control Offset Fan Temperature Target Offset (a.k.a. T-Control) indicates the relative offset from the Thermal Monitor Trip Temperature at which fans should be engaged. [set to 0xA at my system]
bit[16:23]: TCC Activation Temperature The minimum temperature at which PROCHOT# will be asserted. The value is degrees C. [set to 0x61 at my system]
bit[24:30]: TCC Activation Offset Specifies a temperature offset in degrees C from the temperature target (bits 23:16). PROCHOT# will assert at the offset target temperature. Write is permitted only if MSR_PLATFORM_INFO[30] is set. [set to zero at my system]
bit[31]: LOCKED When set, this entire register becomes read-only. [set to zero at my system]
bit[32:63]: Reserved.
2. MSR_CORE_PERF_LIMIT_REASONS [addr: 0x64F] : Value=0xE18C0000 at my system
Indicator of Frequency Clipping in Processor Cores (R/W) (Frequency refers to processor core frequency.)
bit[5]: Running Average Thermal Limit Status (RO) When set, frequency is reduced below the operating system request due to Running Average Thermal Limit (RATL). [set to zero at my system]
bit[21]: Running Average Thermal Limit Log When set, indicates that the RATL Status bit has asserted since the log bit was last cleared. This log bit will remain set until cleared by software writing 0. [set to zero at my system]
Can I use the above registers to set the temperature limit below the TCC Activation temperature?
MSR_CORE_PERF_LIMIT_REASONS is READ only register, and bit 5 is set to zero. Does it mean RATL is disabled in my system?, Or will it be set to 1 when the temperature reaches the TCC value?
@McCalpinJohn, It would be great if you could explain this.
Thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can anyone provide more information about question mentioned above.
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am not aware of any other documentation regarding RATL, but it looks like Intel has extended the concepts used by RAPL to the thermal domain. The throttling activated by PROCHOT mechanism is extremely strong -- it is intended to reduce processor activity dramatically to prevent permanent damage to the chip, and is usually set very close the the limit that would cause an immediate shut-down of the chip. One could imagine situations in which finer control would be desirable -- like the ability of RAPL to allow the processor to deliver as much performance as possible while keeping power consumption under a specifiable limit, a RATL mechanism could allow the processor to deliver as much performance as possible while keeping temperature under a specifiable limit.
The bit descriptions in Table 2-39 of Volume 4 of the SWDM apply to a large range of processor models, so it is possible that not all of the bits are actually implemented in all of the processors listed.
For MSR_CORE_PERF_LIMIT_REASONS (0x64f), bit 5 (RATL status) will only be set if the RATL mechanism is enabled and active at the exact time the MSR is read. For some of these "status" events, the overhead of entering the kernel to read the MSR will reduce the activity of the core enough for the status to become inactive, so this bit is not 100% reliable. Bit 21 (RATL log) will stick at "1" if the RATL mechanism has ever been active on this core. If you never see it set, then either (1) RATL is not enabled on your system, or (2) RATL has not been triggered on your system since the most recent boot.
The bit descriptions that you quote for MSR_TEMPERATURE_TARGET (0x1a2) are only included in Table 2-53, which applies to the "Intel Core Ultra 7 Processors Supporting Performance Hybrid Architecture" -- not the same as your Xeon Scalable Gen3 processor. On the other hand, you do see bits 10 and 12 set, and this is the only description that includes descriptions of bits 0-15, so it is worth looking at. Bits 15:8 are 0x0A on your system, which looks like it means the processor will request that the fans be engaged at 10 degrees C below the TCC Activation temperature -- 0x61 - 0x0a = 0x57 = 87 decimal.
In the description of bit 7 (TCC offset clamping bit) "P1" is the maximum non-turbo frequency (usually the same as the nominal frequency of the part, but getting more complicated over time with heterogeneous cores and programmable TDP, etc.).
The lack of any more RATL documentation suggests to me that it might not be fully implemented in any processors. At a coarse scale you could write a kernel driver (or modify something like the existing Intel P-state driver) to take temperature into consideration when choosing P-states. Unlike power, temperature is "low-pass-filtered" by the thermal mass of the chip plus heat sinks, so it will be limited in how fast it can change. Because of this there is probably not a need for dedicated hardware to maintain a running-average temperature estimate -- the instantaneous values will be very close to the average.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page