- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I'm trying to figure out how to read the temperature sensor(s) on the PHI from a program running on the embedded linux.
I installed MPSS version 3.4.3 and the board SKU is B1PRQ-5110P/5120D.
The Intel system software developers guide for the PHI mentions, in section 4.2.8.2, that "The processor implements internal MSRs (IA32_THERM_STATUS, IA32_THERM_INTERRUPT, IA32_CLOCK_MODULATION)".
However, these MSRs are not mentioned again anywhere else in the documentation (or at least I could not find them). I've tried reading the 0x19C MSR from each core (which is what I'd do on "normal" Xeon processors), but I just get rubbish.
Is there documentation anywhere on how one can read the on-chip thermal sensors (if possible)?
Many thanks for any hint.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is not a lot of documentation, but when you are running on the Xeon Phi you can try:
cat /sys/class/micras/temp
From the host you can run
micsmc -t
which provides what appears to be the same information, but with labels for each of the lines.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quick update, in case someone was interested.
I had a look at the kernel code that generates /sys/class/micras/temp and did some inference from the SBOX control registers list in the system developer's guide (page 146).
I quickly hacked together the attached kernel module that reads all the available "CURRENT_DIE_TEMP" information.
A sample output follows:
[342630.904960] Module phitemp loaded at 0xffffffffa00c2000
[342630.905566] Die temp0: 46
[342630.905575] Die temp1: 38
[342630.905582] Die temp2: 41
[342630.905589] Die temp3: 44
[342630.905597] Die temp4: 44
[342630.905604] Die temp5: 43
[342630.905611] Die temp6: 41
[342630.905618] Die temp7: 0
[342630.905624] Die temp8: 0
It seems that in this way one can actually get read 7 different on-die sensors.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So.. Given that the chip appears to have seven thermal sensors and it has 40 cores, the question now is where these sensors are located compared to the cores.
Clearly, this question is important for any thermal management system that wanted to use the on-die sensors.
I tried to dig into the Intel documentation, but I found no information about this; any help would be really appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't think that Intel has documented the physical locations of the temperature sensors on the die, but it seems reasonable to assume that they are placed to get adequate coverage of places on the chip where temperature maxima might occur. The cores on the Xeon Phi use a lot less power than the cores on a mainstream Xeon processor, so single-core "hot spots" seem less likely to be a problem.
Xeon Phi does not have a lot of "domains" in which it can exercise independent control (e.g. of voltage), so taking the maximum temperature is probably adequate for this first-generation product.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think about the best you are going to do with respect to sensor location is table 6-25 "Table of Sensors" and figure 2-2 "Intel® Xeon Phi™ Coprocessor Board Top side (for reference only)" in the Intel® Xeon Phi™ Coprocessor Datasheet. What is probably of most interest to you is pv_vrtemp - Temperature reported from the Core VR. In effect there is only one core temperature measurement, but it measures a value that covers all the cores.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the replies; table 6-25 is definitely helpful, as it is helpful to have a confirmation that there is really only one core temperature measurement.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page