- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi - I just bought a H2312JFQJR quad-node system with 8X E5-2670 processors for scientific computing. The systems were initially very easy to setup, but now when I run a simulation across 4 nodes (all 64 cores) the following things happen:
- The systems slow down considerably. I notice that in /proc/cpuinfo the clock frequency is only 1200 MHz. With three systems, its 2600.
- The amber warning lights become lit on each server.
- None of this happens if I only load 3/4 nodes.
- In my kern.log I see many messages like this:
Oct 23 19:36:51 localhost kernel: [ 2506.830104] CPU27: Core power limit notification (total events = 196)
Oct 23 19:36:51 localhost kernel: [ 2506.830112] CPU15: Package power limit notification (total events = 196)
Oct 23 19:36:51 localhost kernel: [ 2506.830122] CPU11: Package power limit notification (total events = 196)
Plugging the whole system through a load meter, I see that is pulling about 1000W right before these things happen, and then it suddenly decreases power usage dramatically (to maybe about 500W). The chassis is powered through 2 1000W power supplies, which are supposed to be redundant and split across the four nodes. Questions:
1) Whats going on? The system is supposed to be compatible with these CPUs.
2) The TDP of the E5-2670 is 115W. 115W x 8 is 920W, leaving a scant 80W available for everything else assuming the power supplies are really redundant. This can't be right.
3) Would it help if I switched to a 208V input power? I remember seeing somewhere (perhaps on the PSUs themselves) that their capacity was 1200W with 200V input power.
4) Are there bios settings that I should look out for?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you running dual or a single Power supplies?
If dual (or even if in doubt) run the SDR update to make sure all the sensor devices are correctly loaded.
The SDR tool tell the BMC which tells the ME the type, number and size PSU's installed.
If you have not loaded the SDR the master node treats the PSU as a non-PMbus smart power supply and will put all nodes into 100% throttling (1.2Ghz) if the PSU asserts an ALERT signal due to power load or tempeature issue. With the SDR loaded, the ALERT can be handled by the power management system and the system will attempt to run at best possiable speed by moving in and out of throttling to keep running even on a single PSU
The SDR sets up both of these sensors based on how your system is configured.
In a dual PSU configuration, you should have more than enought power to handle 1000w even running 110v lowline. I am going from memory here, but you should be able to sustain about 2000w+ with dual PSU's
In normal idle opertional mode, PS1 LED should be solid green and PS2 LED should be blinking at once per second. As you start increasing the work load, both PSU's LEDs will go to solid geen. Any amber LED indicate either no input power (plug it in) or a faulty PSU.
One possiability is that one of the PSU is not pushed fully into the chassis. In this case, the PSU LED also blinks once per second green, but not being fully inserted, it never goes to soild green to start shareing the load.
and yes 208v/220v AC (called Highline in some of the documentation) will give you a signinficant power boost (over 2400W in Dual PSU mode).
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page