I am brand new to this forum and am looking for some help, hopefully quickly. We recently acquired an Intel Modular Chassis V2 with 3 drives for a RAID 5 utilizing 2 nodes and External iSCSI storage. The nodes themeselves are powered by PROXMOX.
We have updated the firmware to the latest firmware, as of April 2012 with22.214.171.12420307.34736
Everything will run fine for a couple of weeks, but after that, we start to notice that the fans on the server start to get louder and louder over the course of a few days and then the admin interface via ethernet to the chassis goes deaf. No pinging, no acceptance to https requests, nothing.
After we bring down the nodes gracefully, we force down the chassis, and once we bring it back up and online, looking in the events for any issues, we don't see anything that stands out.
Has anyone seen this before? If so, do you have a work around for it or is this a known issue? I would think that a server of this nature should be designed for 100% uptime outside of firmware / hardware updates or issues.
I am looking for a solution as we are looking to purchase more nodes shortly, but with this type of (lack) or reliability right now, we don't want to make any more purchases until its either straightened out, or we look at a different hardware vendor for our solution.
Thanks for any help you can provide me.
It seems the CMM hung up for some unknown reason... I'm not aware of any known issue like this. Can't tell more without reading the log files.
Just to let you know that the absence of CMM doesn't impact operation of other components like compute module, switch, and storage module. To ensure proper cooling, all fans will be running at full speed if the CMM is down. To reset the CMM, you can simply unplug and plug it while the system is running. There is no need to bring down the compute modules (nodes).
I am seeing a very similar problem. I can connect to the CMM through the GUI no problem all day. Then when I come in the next day I cannot connect. I can not ping it or connect through a browser. There is no link light on the back of it.
In order to get it to work, I have to plug in a crossover cable to it and my laptop. Then the lights start blinking and I can plug it back into the switch and connect to it no problem.
It will work all day until I come in the following morning and the same thing happens. This has happened three days in a row.
Any help would be appreciated.
I've seen excessive network traffic cause this.
If you have dual switches, disable the Admin link between them
From the CMM GUI, choose System | Switches | Advanced Configuration
Mouse over any port and choose Port Configuration
Select Port | 10G.XC | Admin Status | Down
and Port | 10G.SC | Admin Status | Down
In my case, I don't have dual switches, only the original supplied switch on the chassis. Should this still be done? If so, I'm more than happy to and I can reply with the results in a week or so once the chassis would normally go deaf.
In our case as well, I would not say there is a lot of excessive traffic at this point, only 3 VM's running windows 2008 for about 6 users and 3 virtual containers running debian linux.