I'm running an experiment on a server machine with a quad-core Xeon X5355 processor running a linux system.
I try to control core voltage and frequency separately by writing to the msr IA32_PERF_CTL (0x199). I change the value of IA32_PERF_CTL using a "wtmsr" command and verify that its value has been changed using a "rdmsr" command. However, when I run "rdmsr 0x199" again a few seconds later, I find that the value of IA32_PERF_CTL is overwritten with its previous value. The value of IA32_PERF_STATUS does not represent my change either.
I suspect that it is the operating system that overwrites IA32_PERF_CTL to a fixed value, because the OS is trying to do power management. But I cannot find out which module is doing the power management. Here comes my question:
1. Will the operating system change the value of IA32_PERF_CTL? How can I disable this module?
2. If it is not the OS that writes to IA32_PERF_CTL, why the value of IA32_PERF_CTL is always overwritten to the same value?
My server machine is a Dell PowerEdge 2900 running openSuse 12.2. I already enabled power management in BIOS so that I can write to msr 0x199. I also enable EIST by setting msr 0x1aa MSR_MISC_PWR_MGMT = 0, msr 0x1a0 IA32_MISC_ENABLE = 1.
If you have any suggestions, or if you find any error in my setup, please don't hesitate to reply! I'll be really appreciated!
Great great great thanks.
(I'm not sure if I should post this in the "Intel ISA Extensions" section, or the "Software Tuning, Performance Optimization & Platform Monitoring " section. If this post is more suitable for the other, I'll move it there.)
Frequency control is typically handled by a kernel module called "acpi-cpufreq".
$ /sbin/lsmod | grep acpi
acpi_cpufreq 7763 1
freq_table 4936 2 cpufreq_ondemand,acpi_cpufreq
mperf 1557 1 acpi_cpufreq
If this is the module controlling frequency on your system, you can control the requested frequencies using the interfaces in the /sys/devices/system/cpu/cpu*/cpufreq directories:
$ ls -l /sys/devices/system/cpu/cpu0/cpufreq
-r--r--r-- 1 root root 4096 Dec 2 12:53 affected_cpus
-r-------- 1 root root 4096 Dec 2 12:53 cpuinfo_cur_freq
-r--r--r-- 1 root root 4096 Dec 2 12:53 cpuinfo_max_freq
-r--r--r-- 1 root root 4096 Dec 2 12:53 cpuinfo_min_freq
-r--r--r-- 1 root root 4096 Dec 2 12:53 cpuinfo_transition_latency
-r--r--r-- 1 root root 4096 Dec 2 12:53 related_cpus
-r--r--r-- 1 root root 4096 Dec 2 12:53 scaling_available_frequencies
-r--r--r-- 1 root root 4096 Dec 2 12:53 scaling_available_governors
-r--r--r-- 1 root root 4096 Dec 2 12:53 scaling_cur_freq
-r--r--r-- 1 root root 4096 Dec 2 12:53 scaling_driver
-rw-r--r-- 1 root root 4096 Dec 2 12:28 scaling_governor
-rw-r--r-- 1 root root 4096 Nov 25 12:46 scaling_max_freq
-rw-r--r-- 1 root root 4096 Nov 25 12:46 scaling_min_freq
-rw-r--r-- 1 root root 4096 Dec 2 12:53 scaling_setspeed
Hi Mr. McCalpin,
Thank you for your suggestion!
I do have the acpi_cpufreq module loaded in my system. I try using it to chance CPU frequency and it works. However, acpi_cpufreq does not allow me to scale CPU frequency and voltage at the same time: when I set the CPU frequency to some value, the corresponding voltage also changes automatically.
I try writing to IA32_PERF_CTL to set CPU voltage and use "rdmsr" to verify that I have updated its value. But the value in IA32_PERF_STATUS stays unchanged. Does it mean that acpi_cpufreq does not allow a user to directly change CPU voltage?
The user can never *directly* change either voltage or frequency using these mechanisms. The acpi_cpufreq drivers use the IA32_PERF_CTL MSR to request a target "performance state" value. Intel warns that the meaning of the 16-bit encodings are model-specific, but on the machines I have looked at IA32_PERF_CTL bits 15:8 are the requested CPU frequency multiplier. The documentation describes no means to directly request a voltage.
IA32_PERF_STATUS (bits 15:0) reports the *current* performance state value. Again, Intel makes no guarantees about the meaning of these bits, but on my systems bits 15:8 contain the current CPU frequency multiplier.
Some systems also report the voltage in IA32_PERF_STATUS bits 47:32, but I have only seen this documented for the Sandy Bridge processors -- described in Table 35-15 of Volume 3 of the Intel SW Developers Guide, document 325384).
It really does not make sense to allow a user to set voltage and frequency independently, since the processor will start giving the wrong answers if the voltage is too low for a particular frequency. The vendors test this all the time, of course, but once the parts are manufactured there is no reason to allow a user to request a voltage that is lower than the minimum voltage needed to operate at the requested frequency.
I see...I'm still curious on if I can change processor voltage by writing to msr 0x199. I did read some online post claiming that they have changed processor voltage.
I read the Intel software developer guide but they don't have detailed explanation of IA32_PERF_CTL on my processor. It only says that IA32_PERF_CTL[15:0] are the target performance state bits. I think mine is the same as yours, which uses bits 15:8 for frequency values, as these bits will change when I use cpufreq-set to change the cpu frequency. Bits 47:32 are all 0s for me.
I want to do this experiment because I want to reach some "corner cases" of the processor operation. I agree that this will not be useful in normal operation.