Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
1123 Discussions

MSR at 0x19c bit 2: Indicator of Throttling?

ryancox
Beginner
1,677 Views
I would like to know about a specific MSR. I have been looking at 0x19c to see if throttling is occurring. If I use the rdmsr utility on Linux like "rdmsr -c -p0 0x19c", I get a result similar to "0x881f0008". If I bitwise-and the result with 1, I get the thermally throttled state (1=throttled, 0=normal). I looked at the Linux kernel source code and saw that the value is labeled as THERM_STATUS_PROCHOT for bit 0.

We have had other throttling issues and want to know what bit 2 (i.e. 1<<2) corresponds to. Empirical evidence shows that the processor is indeed throttled and a run of Linpack xhpl will have a slower benchmark number. Am I off-base here in thinking that this bit represents some other form of throttling? This seems to be an extremely reliable indicator of "slower" performing servers.

Also, I know that a change in the thermal throttling state results in an interrupt. Does this kind of throttling (bit 2) also result in an interrupt that we can catch and log?

Based on my observations it seems it may be triggered by something in the Dell M1000e chassis that we have. We have many of these and whenever we do a firmware upgrade on the chassis, the CPUs will show as throttled for a minute or so. There are also occassions where processors get throttled for no good reason at all as far as we can tell.

I could describe it in much greater detail here, but I wrote a very lengthy article about it at http://tech.ryancox.net/2010/11/diagnosing-throttled-or-slow-systems.html


0 Kudos
1 Reply
ryancox
Beginner
1,677 Views
Okay... so I figured it out myself just now. I had been looking through Intel's manuals for a while but hadn't seen the answer until now. The answer was in Intel 64 and IA-32 Architectures Software Developer's Manual > Volume 3A: System Programming Guide (http://www.intel.com/products/processor/manuals/). It is in section 14.5.5.2 Reading the Digital Sensor. It indicates throttling by "another agent on the platform". It is either PROCHOT or FORCEPR (I'm guessing it's FORCEPR).

Now I just need to figure out what is causing it because it definitely isn't the room temperature.
0 Kudos
Reply