Solved: Measuring Temperature by using Intel THERM_STATUS MSR

sunilk · ‎01-15-2024

Hi,

I have gone through with the solution posted by McCalpin John: https://community.intel.com/t5/Software-Archive/Intel-Xeon-Phi-Reading-MSR/m-p/1170935

I have used IA32_THERM_STATUS (MSR 0x19c) to measure instantaneous temperature for a long duration by running a real world benchmark. I have attached an image of temperature vs execution timeline. I have seen steep changes in temperature value within a few seconds which is not supposed to happen as temperature should be rise and fall very steadily I believe. Can anyone please confirm if it is correct or not by looking an attached image.

Thanks!

McCalpinJohn · ‎01-16-2024

Why do you think the temperature should rise and fall "slowly"?

Consider the amount of heat energy required to change the temperature of a bare processor die. Assume the die is 20mm x 20mm x 0.5mm == 0.2 cm^3. At a typical density of 2.3 g/cm^3, this is just under 0.5 grams. The specific heat of Silicon is 0.7 J/(gram degree), so it only takes 0.35 Joules to raise the die temperature 1 degree C, or 35 J to raise the temperature 100 degrees. The maximum jump of about 10 degrees in one second would only require a power change of 3.5 Watts.

Of course this is why one tries not to run with a bare processor die (but does give a hint as to the challenges when performing pre-package testing!).

If we consider a processor tightly coupled to a 0.3 kg copper heatsink and assume infinite heat transfer rates, we can compute the energy required to change the average temperature of the combined system. The heat capacity of copper is about 0.385 J/(gram degree), so 115.5 Joules to shift the entire block by 1 degree. For the 1 second intervals above this would require a change in energy input of 115.5 watts, which is large, but not unreasonable.

In the real world of finite heat transfer rates, the heat generated in the processor will not move out to the heatsink instantly, so processor die temperature variations will take time to be damped by the thermal inertia of the heat sink and even more time to be damped by transfer into the thermal inertial of the full cooling system.

Also note that the "temperature" reported by Intel processors is the *maximum* of the temperatures at all the on-die sensors, not the average temperature of the die. The maximum temperature corresponds to a small area on the die, so it can change even faster than the average temperature for the die.

View solution in original post

McCalpinJohn · ‎01-16-2024