Our company maintains a number of customers who use MFSYS25 servers, and the number is expected to grow.
As a leader of our admin team, I find the remote administration capabilities of MFSYS inferior to those on other servers, in terms of both stability, workforce performance, and automation. I would like to ask the community if anyone else has the same problems, how they work around them, or if Intel is willing to fix the inconveniences in newer firmware versions?
Namely, here are some of the gripes we have (with firmwares up to 6.6 - not sure if they are still relevant in the most recent 6.8):
* SSH access to server serial console often disconnects, sometimes characters are echoed back to the input console (i.e. double characters are printed in SSH terminal, but just one is input to the server console). Using such Linux/Solaris programs as "vi", "mc" and others which expect a not-very-dumb terminal is often a recipe for disaster with endless loops of session disconnections and ignored input (i.e. I can't exit "vi" because of this).
Perhaps the TERM variable is to be set to something specific, but we haen't yet figured out - to what...
* Graphical KVM often generates numerous keypresses while we are typing, and entering a ommand or password on graphical console is also tricky. Especially this is a problem with passwords and lock-out policies in place.
I can say that this is a common problem with many jKVM's, including VMWare console, Sun ILOM and others. But it is a nasty behavior anyway.
* The administrative web-interface is rather slow, and many things could be trimmed-down to save on traffic and rendering time (especially when we have access to the internal admin network over a slow link and perhaps thru an RDP session to a local admin's machine, or use admin Sun Ray terminals in our LAN - the login window with its nice landscape photo background takes ages to render).
* It would be nice to have a CLI, implementing at least the most useful features of SCM - state checks, power resets, storage and especially switch administration.
** One of primary uses for CLI beside daily administration tasks would be initial setup of the boxes - many of our customer sites use similar networking layouts, and it currently takes ages to click thru two switch module GUIs to set up all the VLANs, STPs and so on, or a bit faster - to lay out the disks...
It SHOULD really take a couple of seconds to copy-paste from an admin documentation page
* Switch module administration tools (in main GUI and in "Advanced" pages) are inconsistent, and occasionally we have problems logging into one or both switch modules advanced GUIs - we only get Guest/ReadOnly accesses.
The inconsistency is that whenever VLAN options are edited in the "simple" interface, which only cares about a primary VLAN for the switch ports, any additional (tagged) VLAN settings are reset. Quite often VLAN settings are forgotten on the link between two switch modules. Jumbo frames seem to be enabled on only one module at a time, and "Enable after reset" checkmark is often ignored.
* There is also a couple of related hardware gripes: we've had several server blades which function properly (as servers) but seem "lost" to the administration GUI.
** It is not too funny to see "pull out and reinsert the module" suggestion as an only option, especially when you're physically half the globe away from the server (yes, Russia is big), and a technically competent person can only come by and see it "sometime next week".
After the server modules are reinserted, they usually are seem by the chassis management - for a while at least. So I think this is a "soldering/contacts" problem, and thus an overlook from Intel's craftsmen...
** I understand when there is a problem connecting to BMCs, perhaps a physical contact problem. But inability to power on/off/reset the server module in this case? That just doesn't make sense - the power source is on the chassis side, so (in future HW revisions?) it should be controllable by the chassis - regardless of the server module BMC state!
** At some sites we have remotely-controllable UPSes and/or rack PDUs, so we can poweroff-and-poweron the whole chassis remotely. Quite often, connection from Admin GUI to Server modules is repaired - for a while. I think there should be an option to try and reinitialise the link to BMC in software - just in case it would work and help...
** Maybe, with all the resilience features built into the server chassis, it would make sense to install a secondary/failover BMC module in each server, soldered on the motherboard or as a mezzanine card?
Now, we like these boxes - despite the terrible basic administration options (which do get better and fancier over time - but low-level basics still suck). I really wish they would also become easier and more efficient to administer - we are likely to use them in future projects anyway.
I think this would only take some Intel QA people to sit down with the GUI, try to do something real with SSH and KVM consoles, roll out a dozen identically-configured boxes within an hour, and so on. When they get as annoyed as we are, it would take a few developers to fix the firmware and make everybody happy!
Thanks to everyone who has read this far!
If you've had problems like I've outlined above, and found workarounds (or know they're fixed in newer firmware versions) - please post a reply.
If these problems are still in place - please help by filing an RFE or otherwise bringing this to Intel's attention.
Finally, thanks in advance to Intel engineers who would improve these servers. You've done a great job so far - there's just a little to add on
Just remembered, there is one more inconsistency with networking module VLAN setup: it is possible to use several different settings (pVID, native vlan, untagged vlan, simple/advanced GUI configurations) to effectively have several untagged VLANs assigned to a single switch port. You can imagine what mayhem this brings at times...