Is there a way to prevent the core from going to C6/C7 C-state? I looked into my BIOS and it only provides support for disabling package C-states but not core c-states.
In Linux systems the core C-states can be disabled using by modifying a boot time option and then using a bizarre interface.
First, you need to disable the "intel_idle" C-state governor by adding the boot option "intel_idle.max_cstate=0" and rebooting the system.
Second, you need to write a program that opens the file "/dev/cpu_dma_latency", writes a 32-bit binary value to the file, and then keeps the file open. If you close the file, the system reverts to the default behavior (which will allow all C-states). Sample routines for this purpose are provided at https://access.redhat.com/articles/65410
The value that you write is the maximum number of microseconds that the OS will allow for a processor to recover from a C-state. For the Xeon E5-2680 (Sandy Bridge) processors in my systems, the recovery times (also in microseconds) can be seen by
$ cat /sys/devices/system/cpu/cpu0/cpuidle/state/latency
C0 has a "recovery time" of 0 microseconds, since it is the active state. C1 has a recovery time of ~1 microsecond, but still saves a fair amount of power. We typically use 75 microseconds to prevent C-states with higher numbers than C1.
Although I don't understand the details, there is not a 1:1 correspondence between the ACPI C-states used by the "cpuidle" facility and the hardware C-states as actually implemented by the processor. You can certainly block all states with numbers higher than C1, but it is not clear whether this interface will allow control at finer granularity.
Thanks John as always. It worked great. But I am not sure I understand why I need to use the boot option "intel_idle.max_cstate=0". When I use this option, I see cpuidle states from state0 to state2 at /sys/devices/system/cpu/cpu0/cpuidle/. But when I don't use this option, state0 to state5 are visible. In both the cases, I am able to control which c-state the processor is allowed to go by writing the appropriate value in cpu_dma_latency file.
My understanding is that the Intel idle driver can ignore BIOS and/or OS requests to not use the higher-numbered C-states. If you add the boot option "intel_idle.max_cstate=0" it should cause the system to use the acpi idle driver instead of the Intel idle driver.
It is not surprising that the two idle drivers show different states -- the Intel idle driver probably understands the hardware better than the ACPI idle driver.
You might want to try the same tests without the special boot option and see if the system still obeys the /dev/cpu_dma_latency control. If it works, then the whole process becomes much easier!
So I tried writing latency values to cpu_dma_latency file without the special boot option and it works fine. So I guess, it is not needed, at least for my machine! Thanks for your help.
So I found that I can control the c-state individually for each core by just writing '0' or '1' to file /sys/devices/system/cpu/cpu/cpuidle/state/disable. I am wondering if there are any downsides to controlling c-states this way!