we recently had an issue where out Intel Modular Server went down due to a long power loss. the unit itself comes up and the interface shows the the drives and virtual drives are "ok" but i cannot get any on the server modules to boot up. i also get inconsistant results while trying to reset the SCM unit (while all server modules are powered off). i suspect there is a problem with the SCM but i do not know enough about the machine to be to tell. it says that it is "ok". i am also finding it very difficult to find any support for this server which we purchased only last summer (2010).
any help would be greatly appreciated.
update: have now been able to power on the 3rd server module and boot the vmware exsi server. but cannot start any of the VS s on it. just to make clear, the entire unit, server modules, drives, virtual drives on the modular server all say that they are working correctly in the web interface. my xenservers on the first 2 server modules will not boot. the first server module will sometimes say it is loading but then stalls. other times it says it cannot find the boot drive.
How long was the power outage?
You may try to unplug and replug the SCM with all compute modules powered off. Monitor the status of the SCM in the web GUI until the status is OK. After that if the compute modules still fail to boot, go to Diagnostics => Service Data => Complete System Diagnostics. Attach the log file here.
I attached the diagnostic zip file.
I am not sure the total time the power was out. But from my understanding the power fluctuated several times throughout the weekend.
I did try to reset the SCM1 (no SCM2) and got an error that stated to pull the module and re-insert it. All server modules were powered off during this time. After re-inserting the module, the status came back to "Ok". My storage pool and virtual drives within the pool all say that they are "Ok".
not sure... anyway, i ran a new one and attached it. also i put a "1" on the end in case it doesn't allow a posting a file with the "zip" extension.
thanks for the help!
I took a brief look at the log file. I noticed that the log starts from 2011-07-28 16:02:04. So I can't see what happened durint the power outage. Did you clear the log? The SCM is now in SCM slot 2. Is it because it's not working in slot 1? Or, could you try to move it back to slot 1 and see if it works?
I also noticed media error reported on physical drive 5. It's recommended you replace it immediately.