- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a few servers that are rebooting (ungracefully) unexpectedly.
There is no errors in the logs.
$ last
shows me:
reboot system boot 2.6.32-5-amd64 Sun Aug 12 20:53 - 12:25 (18+15:32)
As if they were legitimate reboots. The server reboots, and comes back online.
Any idea why this is happening?
Running Debian Squeeze on SR2612UR with SAS drives.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Lukasz,
A couple things:
- The processor may be overheating
- Make sure your vents are not blocked by dust. Dust can accumulate over time
- A faulty Power Supply Unit
- It could be because of operating system corruption
- It could be a memory error
- If it is due to faulty memory, either it just needs to be reinserted/cleaned or even replaced
- A faulty motherboard
I'm not sure about the Debian operating system and what more you can do to detect software events.
You could use the http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=17933&lang=eng SEL Viewer for UEFI/Windows*/Linux* for S5500 and S5520 boards to see if hardware errors are detected there.
Can you reboot to a DOS USB stick , or uEFI and let it sit for a time being to see if it reboots under that?
Regards,
John
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Vents not blocked.
- Shouldn't this show up in the RMM3 ?
- Power Supply Unit
- Possibly.
- Operating system.
- I tried reinstalling it.
- This same OS works on other nodes (exact same server type and config)
- Memory error.
- Shouldn't this show up in some logs?
- Faulty motherboard
- Shouldn't this indicate some type of error?
I have bought about 50 of these servers so far, and about 10 of them have had this problem.
Since I ship them across the world, its not very convienent to 'plug in a USB key', or 'reinsert memory sticks'.
I don't understand why the quality of these is so low.
I will also try to connect the SEL Viewer to see if it shows up anything.
RMM3 should show me every hardware problem with the system, but it doesn't.
I also try them in my lab for about a week to make sure its fine. Then I ship it on-site, and its faulty.
I did see in the ssh session of RMM3:
ufip=/system1/sp1/logs1/record121
Properties:
LogCreationClassName=CIM_LogRecord
LogName=IPMI SEL
CreationClassName=CIM_LogRecord
RecordID=121
MessageTimeStamp=13:56:12,January 15,1970
RecordData=System Event - OEM System Boot Event - Asserted
identity=SEL ENTRY
ufip=/system1/sp1/logs1/record123
Properties:
LogCreationClassName=CIM_LogRecord
LogName=IPMI SEL
CreationClassName=CIM_LogRecord
RecordID=123
MessageTimeStamp=13:57:59,January 15,1970
RecordData=Power Unit - Power Unit Failure detected - Asserted
identity=SEL ENTRY
ufip=/system1/sp1/logs1/record124
Properties:
LogCreationClassName=CIM_LogRecord
LogName=IPMI SEL
CreationClassName=CIM_LogRecord
RecordID=124
MessageTimeStamp=13:57:59,January 15,1970
RecordData=Power Unit - Power Off / Power Down - Deasserted
identity=SEL ENTRY
ufip=/system1/sp1/logs1/record125
Properties:
LogCreationClassName=CIM_LogRecord
LogName=IPMI SEL
CreationClassName=CIM_LogRecord
RecordID=125
MessageTimeStamp=13:57:59,January 15,1970
RecordData=Power Unit - Power Unit Failure detected - Deasserted
identity=SEL ENTRY
ufip=/system1/sp1/logs1/record126
Properties:
LogCreationClassName=CIM_LogRecord
LogName=IPMI SEL
CreationClassName=CIM_LogRecord
RecordID=126
MessageTimeStamp=13:57:59,January 15,1970
RecordData=OEM - Asserted
identity=SEL ENTRY
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page