Is there an interaction between P-states and C1 state?

McCalpinJohn · ‎03-19-2014

I have found lots of references that say that there is no interaction between P-states and C-states, but I am curious about one specific case that might have an interaction.

In the C1 state, the core clocks are stopped, but the private caches remain active.

If a processor is in a low-frequency P-state (e.g., 1.2 GHz on a 2.7 GHz Sandy Bridge core) when it enters C1 (via MONITOR/MWAIT), do the caches continue to operate at the low frequency? Or do the caches not pay attention to the core multiplier ratios?

If the caches do run slower at the higher-numbered P-states, then I have questions about how the L2 to System Agent manages the (presumably) asynchronous clock boundary with the Rings, since the Rings appear to run at the speed of the fastest core -- but that is probably too much detail for a public forum. Actually, I have no experimental evidence that the cores in my Xeon E5-2680 (Sandy Bridge EP) are capable of running at different frequencies at the same time. I can statically set then to different frequencies, and they report to be running at different frequencies, but when I run a benchmark they all seem to speed up to match the frequency of the fastest core. I have not yet checked to see if that is due to the uncore Power Control Unit (PCU) over-riding my frequency selections for one reason or another....

McCalpinJohn · ‎07-15-2014

Just to show that I don't forget about these nagging questions, it turns out that there are interactions between P-states and the C1 Halt state, at least in the Xeon E5-26xx (Sandy Bridge EP) platform:

The "package C1E" state drops all cores to their minimum frequency (and voltage) if all cores are in the C1 Halt state --- even if the OS has requested that the cores stay at their maximum frequency. Since the uncore runs at the frequency of the fastest core, the C1E state also reduces its frequency.
This suggests that the caches (which remain active in C1 state) retain the frequency of their prior P-state when entering C1, which should have an impact on snoop response times (e.g., for data that is dirty in the L1 or L2 of a cache that has dropped into C1 state).

See the discussion of the C1E state in another forum thread: https://software.intel.com/en-us/forums/topic/517571