Software Archive
Read-only legacy content
17061 Discussions

Temperature monitoring

Davide_Basilio_B_1
2,128 Views

Hi all,

I'm trying to figure out how to read the temperature sensor(s) on the PHI from a program running on the embedded linux.

I installed MPSS version 3.4.3 and the board SKU is B1PRQ-5110P/5120D.

The Intel system software developers guide for the PHI mentions, in section 4.2.8.2, that "The processor implements internal MSRs (IA32_THERM_STATUS, IA32_THERM_INTERRUPT, IA32_CLOCK_MODULATION)".

However, these MSRs are not mentioned again anywhere else in the documentation (or at least I could not find them). I've tried reading the 0x19C MSR from each core (which is what I'd do on "normal" Xeon processors), but I just get rubbish.

Is there documentation anywhere on how one can read the on-chip thermal sensors (if possible)?

Many thanks for any hint.

 

0 Kudos
8 Replies
McCalpinJohn
Honored Contributor III
2,128 Views

There is not a lot of documentation, but when you are running on the Xeon Phi you can try:

cat /sys/class/micras/temp

From the host you can run

micsmc -t

which provides what appears to be the same information, but with labels for each of the lines.

0 Kudos
Davide_Basilio_B_1
2,128 Views
Thanks John, this is helpful information. just one more question: this is what I get from micsmc -t: mic0 (temp): Cpu Temp: ................ 42.00 C Memory Temp: ............. 33.00 C Fan-In Temp: ............. 28.00 C Fan-Out Temp: ............ 34.00 C Core Rail Temp: .......... 32.00 C Uncore Rail Temp: ........ 31.00 C Memory Rail Temp: ........ 31.00 C From this information, it seems that there are just 7 thermal sensors on the PHI, differently from "standard" Xeon processors that have one per core. Is that the most I can get or are you aware of other thermal sensors that might be accessible through some other interface? Also, it would be quite interesting to have an idea of where the CPU sensor is; since there is a large number of cores and the utilization might vary a lot across the cores, it seems hard for a single sensor to be able to give a reliable read of the hottest temperature across the chip. Again, many thanks for your help.
0 Kudos
Davide_Basilio_B_1
2,128 Views

Quick update, in case someone was interested.

I had a look at the kernel code that generates /sys/class/micras/temp and did some inference from the SBOX control registers list in the system developer's guide (page 146).
I quickly hacked together the attached kernel module that reads all the available "CURRENT_DIE_TEMP" information.

A sample output follows:

[342630.904960] Module phitemp loaded at 0xffffffffa00c2000
[342630.905566] Die temp0: 46
[342630.905575] Die temp1: 38
[342630.905582] Die temp2: 41
[342630.905589] Die temp3: 44
[342630.905597] Die temp4: 44
[342630.905604] Die temp5: 43
[342630.905611] Die temp6: 41
[342630.905618] Die temp7: 0
[342630.905624] Die temp8: 0

It seems that in this way one can actually get read 7 different on-die sensors.

0 Kudos
Davide_Basilio_B_1
2,128 Views

So.. Given that the chip appears to have seven thermal sensors and it has 40 cores, the question now is where these sensors are located compared to the cores.

Clearly, this question is important for any thermal management system that wanted to use the on-die sensors.

I tried to dig into the Intel documentation, but I found no information about this; any help would be really appreciated.

0 Kudos
McCalpinJohn
Honored Contributor III
2,128 Views

I don't think that Intel has documented the physical locations of the temperature sensors on the die, but it seems reasonable to assume that they are placed to get adequate coverage of places on the chip where temperature maxima might occur.   The cores on the Xeon Phi use a lot less power than the cores on a mainstream Xeon processor, so single-core "hot spots" seem less likely to be a problem.

Xeon Phi does not have a lot of "domains" in which it can exercise independent control (e.g. of voltage), so taking the maximum temperature is probably adequate for this first-generation product.

0 Kudos
Frances_R_Intel
Employee
2,128 Views

I think about the best you are going to do with respect to sensor location is table 6-25 "Table of Sensors" and figure 2-2 "Intel® Xeon Phi™ Coprocessor Board Top side (for reference only)" in the Intel® Xeon Phi™ Coprocessor Datasheet. What is probably of most interest to you is pv_vrtemp - Temperature reported from the Core VR. In effect there is only one core temperature measurement, but it measures a value that covers all the cores.

0 Kudos
Davide_Basilio_B_1
2,128 Views

Thanks for the replies; table 6-25 is definitely helpful, as it is helpful to have a confirmation that there is really only one core temperature measurement.

0 Kudos
Brown__Elizabeth
Beginner
2,128 Views

hello,

can we monitor CPU temperature using thermostats?

0 Kudos
Reply