Community
cancel
Showing results for 
Search instead for 
Did you mean: 
idata
Community Manager
1,259 Views

SR2612UR with Debian Linux resets about once a day. Why?

I have a few servers that are rebooting (ungracefully) unexpectedly.

There is no errors in the logs.

$ last

shows me:

reboot system boot 2.6.32-5-amd64 Sun Aug 12 20:53 - 12:25 (18+15:32)

As if they were legitimate reboots. The server reboots, and comes back online.

Any idea why this is happening?

Running Debian Squeeze on SR2612UR with SAS drives.

0 Kudos
2 Replies
idata
Community Manager
47 Views

Lukasz,

A couple things:

  • The processor may be overheating
    • Make sure your vents are not blocked by dust. Dust can accumulate over time
  • A faulty Power Supply Unit
  • It could be because of operating system corruption
  • It could be a memory error
    • If it is due to faulty memory, either it just needs to be reinserted/cleaned or even replaced
  • A faulty motherboard

I'm not sure about the Debian operating system and what more you can do to detect software events.

You could use the http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=17933&lang=eng SEL Viewer for UEFI/Windows*/Linux* for S5500 and S5520 boards to see if hardware errors are detected there.

Can you reboot to a DOS USB stick , or uEFI and let it sit for a time being to see if it reboots under that?

Regards,

John

idata
Community Manager
47 Views

- Vents not blocked.

- Shouldn't this show up in the RMM3 ?

- Power Supply Unit

- Possibly.

- Operating system.

- I tried reinstalling it.

- This same OS works on other nodes (exact same server type and config)

- Memory error.

- Shouldn't this show up in some logs?

- Faulty motherboard

- Shouldn't this indicate some type of error?

I have bought about 50 of these servers so far, and about 10 of them have had this problem.

Since I ship them across the world, its not very convienent to 'plug in a USB key', or 'reinsert memory sticks'.

I don't understand why the quality of these is so low.

I will also try to connect the SEL Viewer to see if it shows up anything.

RMM3 should show me every hardware problem with the system, but it doesn't.

I also try them in my lab for about a week to make sure its fine. Then I ship it on-site, and its faulty.

I did see in the ssh session of RMM3:

ufip=/system1/sp1/logs1/record121

Properties:

LogCreationClassName=CIM_LogRecord

LogName=IPMI SEL

CreationClassName=CIM_LogRecord

RecordID=121

MessageTimeStamp=13:56:12,January 15,1970

RecordData=System Event - OEM System Boot Event - Asserted

identity=SEL ENTRY

ufip=/system1/sp1/logs1/record123

Properties:

LogCreationClassName=CIM_LogRecord

LogName=IPMI SEL

CreationClassName=CIM_LogRecord

RecordID=123

MessageTimeStamp=13:57:59,January 15,1970

RecordData=Power Unit - Power Unit Failure detected - Asserted

identity=SEL ENTRY

ufip=/system1/sp1/logs1/record124

Properties:

LogCreationClassName=CIM_LogRecord

LogName=IPMI SEL

CreationClassName=CIM_LogRecord

RecordID=124

MessageTimeStamp=13:57:59,January 15,1970

RecordData=Power Unit - Power Off / Power Down - Deasserted

identity=SEL ENTRY

ufip=/system1/sp1/logs1/record125

Properties:

LogCreationClassName=CIM_LogRecord

LogName=IPMI SEL

CreationClassName=CIM_LogRecord

RecordID=125

MessageTimeStamp=13:57:59,January 15,1970

RecordData=Power Unit - Power Unit Failure detected - Deasserted

identity=SEL ENTRY

ufip=/system1/sp1/logs1/record126

Properties:

LogCreationClassName=CIM_LogRecord

LogName=IPMI SEL

CreationClassName=CIM_LogRecord

RecordID=126

MessageTimeStamp=13:57:59,January 15,1970

RecordData=OEM - Asserted

identity=SEL ENTRY

Reply