Xeon Phi idle power consumption

Olli-Pekka_L_ · ‎06-01-2013

I'm investigating the power consumption of our Xeon Phi 5110P cards on single-socket, single-phi nodes. It seems that the MICs have a total idle power consumption of 90W which is frankly more that I would expect from an idle state.

I've experimented with turning the various power-related kernel parameters on and off (corec6_on etc.) but it doesn't seem to affect performance. Even shutting down MPSS completely does not seem to change the consumption.

I'm running the latest version of MPSS and firmware and otherwise things work nicely.

I have a couple of questions:

Is this an expected idle power consumption result?
Is there some other parameters I might need to tweak? Perhaps on the host side?
Can I manually force a C or P state?
Where can I check the current power state? Without logging onto the card?

TaylorIoTKidd · ‎06-01-2013

Take a look at the data sheet. You can find it at http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/xeon-phi-datasheet.pdf. I will find more concrete info for you next week.

TaylorIoTKidd · ‎06-11-2013

From the datasheet, you will see that package Auto-PC3 consumes about 105 W, and Deep-PC3 consumes ~40 W. I expect that you are averaging over some interval, catching the card in several power states, mainly Auto-PC3 and Deep-PC3. From Table 5-1 in the datasheet, you can see the different idle states and their power usage. I suspect that the coprocessor drops occasionally down into a Deep-PC3 state (~40W) but floats back up to an Auto-PC3 state due to some type of interrupt or memory processing: Deep-PC3==(interrupt)=>C0=>Auto-PC3=>Deep-PC3==(interrupt)=>C0...etc.

I recommend you check for interrupts on the PCIe bus that cause the host to wake up the coprocessor. In addition to processing explicit interupts, the power management SW attempts to predict the next interrupt, waking up the coprocessor before the interrupt occurs to minimize any latency.

I will publish the next blog in my power management series in about a week. I discuss package C-states and power management.

Holger_A_ · ‎01-17-2014

I have the same problem with our 5110P MIC cards. The idle power consumption is about 100W regarding to micsmc-gui. When I turn off the power saving (PC3, C6,...) there, the power consumption does not change.

Is the frequency supposed to remain constant at 1,05 GHz as I see it or can it be decreased similar to a CPU?

How can I check for interrupts on the PCIe bus?

Olli-Pekka, did you make any progress?

Regards,

Holger

TaylorIoTKidd · ‎01-17-2014

Researching the datasheet a little, I discovered that the link changed. To find the link, look at my article List of Useful Power and Power Management Articles, Blogs and References (http://software.intel.com/en-us/articles/list-of-useful-power-and-power-management-articles-blogs-and-references). I do my best to keep it up to date.

TaylorIoTKidd · ‎01-17-2014

Holger,

I am doing some research and well get back to you next week.

Regards
--
Taylor

TaylorIoTKidd · ‎01-22-2014

Hi Holger,

Here are a couple of brief notes before I and my colleagues dig more deeply into the issue.

5110 B-stepping coprocessors don't have PC6 capability. This includes most of the earlier purchases. You'll need to contact your seller for any additional information.

P-states inherently involve (frequency, voltage) pair scaling due to wanting to minimize leakage current. Also, since performance is the focus of most card users, the handling and use of P-states is different than on Xeons. I will write a blog about this soon.

You need to be careful about how you monitor C and PC state transitions. Monitoring tools require processing meaning they interrupt and cause the processor to transition back to C0. If there are enough of these interrupts, the processor will stay at a higher idle state to minimize latency.

--
Taylor

Holger_A_ · ‎01-23-2014

Hi Taylor,

thank you for your quick and helpfull reply. I will try to investigate which stepping we have and if the monitoring tools have an impact on the power consumtion.

Regards,

Holger

Holger_A_ · ‎02-07-2014

I have had a closer look at the log files and saw the following behavior. When I start micsmc on the host, in the /var/log/messages on the mic almost immediately the following appears:

Feb 7 10:21:09 phi2-mic0 user.warn kernel: [318479.842467] Sent PC3 ready message
Feb 7 10:21:10 phi2-mic0 user.warn kernel: [318479.971201] cpu 181 set to enter package state
Feb 7 10:21:10 phi2-mic0 user.warn kernel: [318479.971212] Suspending timekeeping
Feb 7 10:21:10 phi2-mic0 user.warn kernel: [318479.971226] Changing maxcore freq to 842104
Feb 7 10:21:10 phi2-mic0 user.warn kernel: [318480.589893] Wake up interrupt from host
Feb 7 10:21:10 phi2-mic0 user.warn kernel: [318480.589952] PC3 exit: reverting core freq to 1052630
Feb 7 10:21:10 phi2-mic0 user.warn kernel: [318480.590040] Resuming timekeeping
Feb 7 10:21:10 phi2-mic0 user.warn kernel: [318480.590056] PC3 average residency = 26518363531

I am not wondering that there is a wake up call from the host, but why is the PC3 message only send, when the tool is started? After closing it, no further messages telling that the card enters PC3 or something similar appears.

TaylorIoTKidd · ‎02-07-2014

Looking at the messages (from the host, I assume), I'm guessing the following.

The first 4 messages were in the queue on the coprocessor (but not delievered to the host) from when it entered the PC3 state. They were only delivered after the coprocessor "woke up". The last line, the PC3 average residency, may be the length of time the coprocessor was in PC3 (shutdown) before you started interrogating it with micsmc, i.e. >26 sec assuming the residency is in cycles.

As a test, start up and then close micsmc several times, letting there be a fairly lengthy pause between start ups. I wonder if we will see the above pattern repeated in the log file several times.

Regards
--
Taylor

Holger_A_ · ‎02-10-2014

I started micsmc several times and got the following.

On the mic:
[root@phi2-mic0 log]# tail -f messages

Feb 10 09:24:33 phi2-mic0 user.warn kernel: [574283.282688] Sent PC3 ready message
Feb 10 09:24:33 phi2-mic0 user.warn kernel: [574283.411526] cpu 188 set to enter package state
Feb 10 09:24:33 phi2-mic0 user.warn kernel: [574283.411537] Suspending timekeeping
Feb 10 09:24:33 phi2-mic0 user.warn kernel: [574283.411550] Changing maxcore freq to 842104
Feb 10 09:24:33 phi2-mic0 user.warn kernel: [574283.852542] Wake up interrupt from host
Feb 10 09:24:33 phi2-mic0 user.warn kernel: [574283.852597] PC3 exit: reverting core freq to 1052630
Feb 10 09:24:33 phi2-mic0 user.warn kernel: [574283.852685] Resuming timekeeping
Feb 10 09:24:33 phi2-mic0 user.warn kernel: [574283.852700] PC3 average residency = 8554846889
Feb 10 09:24:37 phi2-mic0 auth.err getty: /dev/ttyS0: No such file or directory
Feb 10 09:24:47 phi2-mic0 auth.err getty: /dev/ttyS0: No such file or directory
Feb 10 09:24:49 phi2-mic0 user.warn kernel: [574299.092892] Sent PC3 ready message
Feb 10 09:24:49 phi2-mic0 user.warn kernel: [574299.221843] cpu 0 set to enter package state
Feb 10 09:24:49 phi2-mic0 user.warn kernel: [574299.221856] Suspending timekeeping
Feb 10 09:24:49 phi2-mic0 user.warn kernel: [574299.221870] Changing maxcore freq to 842104
Feb 10 09:24:49 phi2-mic0 user.warn kernel: [574299.660249] Wake up interrupt from host
Feb 10 09:24:49 phi2-mic0 user.warn kernel: [574299.660293] PC3 exit: reverting core freq to 1052630
Feb 10 09:24:49 phi2-mic0 user.warn kernel: [574299.660377] Resuming timekeeping
Feb 10 09:24:49 phi2-mic0 user.warn kernel: [574299.660389] PC3 average residency = 8429048378
Feb 10 09:24:57 phi2-mic0 auth.err getty: /dev/ttyS0: No such file or directory

When I keep micsmc open, the messages tend to repeat after a certain while, as you can see at the messages starting with Feb 10 09:24:

The host only tells me that it cannot find the ldap server:

Feb 10 09:24:02 phi2 nslcd[2070]: [ebfce7] ldap_result() failed: Can't contact LDAP server
Feb 10 09:24:37 phi2 nslcd[2070]: [e0966e] ldap_result() failed: Can't contact LDAP server
Feb 10 09:25:13 phi2 nslcd[2070]: [22f25f] ldap_result() failed: Can't contact LDAP server
Feb 10 09:25:49 phi2 nslcd[2070]: [d3ba56] ldap_result() failed: Can't contact LDAP server

but ldap is in fact working and I doubt that this message is related to the interrupts in question.

What speaks for the hypothesis that micsmc itself is sending the interrupts is the fact that the temperature of the cards rises from approx 57°C to 62°C when it is started. I will have a look on the power supplies if I can see there somethin related.

Does this help you anyhow or should I run another test?

Regards,

Holger

TaylorIoTKidd · ‎02-10-2014

The data supports my earlier hypothesis, i.e. that micsmc is waking up the coprocessor.

When all the cores are in C1 (halted state), the processor drops down into a package sleep state where the core and uncore clocks are gated.

Feb 10 09:24:33 phi2-mic0 user.warn kernel: [574283.282688] Sent PC3 ready message
        coprocessor sends host that it’s ready for PC3 (all cores in C1); the host then
        signals the coprocessor (via CPU 188) to initiate a PC3 shutdown.
Feb 10 09:24:33 phi2-mic0 user.warn kernel: [574283.411526] cpu 188 set to enter package state
Feb 10 09:24:33 phi2-mic0 user.warn kernel: [574283.411537] Suspending timekeeping
Feb 10 09:24:33 phi2-mic0 user.warn kernel: [574283.411550] Changing maxcore freq to 842104
Feb 10 09:24:33 phi2-mic0 user.warn kernel: [574283.852542] Wake up interrupt from host
        host receives a message (say PCIe activity) requiring processing by the coprocessor
        host sends interrupt to coprocessor to wake up (exit PC3), restoring freq, etc

“rises from 57°C to 62°C when it is started” – I assume you refer to when a PC3 exit occurs?

See http://software.intel.com/en-us/blogs/2013/06/18/title-intel-xeon-phi-coprocessor-power-management-part-2b-package-c-states-the for a high-level description of Package C-states. I believe that the in the latest MPSS version, Auto-PC3 doesn’t exist.

Olli-Pekka_L_ · ‎05-14-2014

We finally had time to really look into this with Andrey Semin.

The instrumentation in the chassis we use (Bull B715) is out-of-band. It has a sensor that measures the power going to the PCIe slot and the auxiliary 2x3 and 2x4 plugs (which is a pretty sweet feature IMO). Thus micsmc is not involved in the measurements.

It seems that the culprit is the network bridge connection. It keeps the Phi awake even though PC6 is enabled as some network "noise" is coming in. Shutting down both the ofed-mic service and the mic0/mic1 interfaces from the host side seems to be required to get out of C0. The PC6 state is, once enabled, very efficient but it would be useful if we could do it without having to compromise with the networking capabilities.

One interesting detail is that if I do a 'service mpss stop' or 'micctrl --shutdown' I would assume that the card goes into a very low-power state. However, the power measurement results indicate that it actually goes to C0.

TaylorIoTKidd · ‎05-14-2014

Hi,

Re: shutdown: I believe both 'service mpss stop' and 'micctrl --shutdown' do not power down the coprocessor, only place it in a ready-to-boot state. This ready-to-boot state is executing a primitive boot loader / control program that waits for incoming commands. It doesn't have any power management and so is in a constant runtime state (C0).

Re: network noise: I'll try to find out more. Network noise not directed at the coprocessor shouldn't keep it out of the deeper C-states. And if the coprocessor is in C6, the host should not wake up the coprocessor unless there's something directed at it.

Regards
--
Taylor

TaylorIoTKidd · ‎05-27-2014

Re: network noise

Thanks to you and your staff for your help. You are absolutely correct. I believe that Andrey (Intel representative) has been working with one of your team on this issue. He filed it as an MPSS bug. You can use #4859164 in your communications with Intel as a reference to the issue if you need further information.

As I read the issue, It isn't network traffic but a specific driver. When not loaded, the coprocessor correctly drops into C6. Unfortunately, IB communication won't work without the driver.

Regards
--
Taylor