- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
I have been trying to develop a runtime energy management library for Intel Xeon Phi using idle state control (C-states). I have read through a few blogs but I could not find answers to the following:
1. How can I implement control of C-states through userspace. Do I need to rebuild the MPSS service with userspace?
2. I have all power features enabled currently (cpufreq, corec6, pc3 and poc6). I can see the usage of the different idle states through
cat /sys/devices/system/cpu/cpu0/cpuidle/state*/usage (showing the usage of different cpuidle states??)
or
cat /sys/devices/system/cpu/cpu243/cpuidle/state*/time (showing the clock ticks on each cpuidle states??)
As I can see the idle states (deeper C states) are more active than the C0 state. However, when I measure the power, I can only see a overall power reduction from 100W (measured at the highest cpu frequency with no power feature enabled) to 82W (measured at the lowest cpu frequency with all power reduction features enabled). Considering more times are now being spent in the deeper sleep states, it this reduction justified? Or am I reading something wrong?
Any help, advice or recommendation would be much appreciated.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Rishad,
My apologies for the lateness of this reply. As occasionally happens, your issue fell through the cracks in our coverage.
You need kernel space access to control C-state transitions. This is due to the MWAIT instruction being privileged. This may change in the future but is the case in current machines.
Your 100W seems low if your processors are all active. My measurements are closer to 190W when running NPB EP with >240 threads.
The Xeon Phi Datasheet shows in Table 5-1 that the power usage should be <115W when all processors are in C1, <50W in PC3, and <30W in PC6. I've found this approximately correct for C1.
Measuring PC3 and PC6 is difficult as it requires special equipment, basically something like a measuring device between the PCIe socket and the coprocessor. PC3 and PC6 are package C-states, requiring the package to be idle for an extended duration. Your monitoring program, if native, will keep at least one core active, preventing the coprocessor from entering the package PC-states. If you are monitoring from the host, the same applies as the monitoring daemon on the coprocessor will keep at least one core active.
I figure the 85W you are measure is due to some cores being in a deeper C-state than C1.
Does this answer your questions?
Regards
--
Taylor

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page