- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
I have a few servers that are rebooting (ungracefully) unexpectedly.
There is no errors in the logs.
$ last
shows me:
reboot system boot 2.6.32-5-amd64 Sun Aug 12 20:53 - 12:25 (18+15:32)
As if they were legitimate reboots. The server reboots, and comes back online.
Any idea why this is happening?
Running Debian Squeeze on SR2612UR with SAS drives.
Enlace copiado
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Lukasz,
A couple things:
- The processor may be overheating
- Make sure your vents are not blocked by dust. Dust can accumulate over time
- A faulty Power Supply Unit
- It could be because of operating system corruption
- It could be a memory error
- If it is due to faulty memory, either it just needs to be reinserted/cleaned or even replaced
- A faulty motherboard
I'm not sure about the Debian operating system and what more you can do to detect software events.
You could use the http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=17933&lang=eng SEL Viewer for UEFI/Windows*/Linux* for S5500 and S5520 boards to see if hardware errors are detected there.
Can you reboot to a DOS USB stick , or uEFI and let it sit for a time being to see if it reboots under that?
Regards,
John
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
- Vents not blocked.
- Shouldn't this show up in the RMM3 ?
- Power Supply Unit
- Possibly.
- Operating system.
- I tried reinstalling it.
- This same OS works on other nodes (exact same server type and config)
- Memory error.
- Shouldn't this show up in some logs?
- Faulty motherboard
- Shouldn't this indicate some type of error?
I have bought about 50 of these servers so far, and about 10 of them have had this problem.
Since I ship them across the world, its not very convienent to 'plug in a USB key', or 'reinsert memory sticks'.
I don't understand why the quality of these is so low.
I will also try to connect the SEL Viewer to see if it shows up anything.
RMM3 should show me every hardware problem with the system, but it doesn't.
I also try them in my lab for about a week to make sure its fine. Then I ship it on-site, and its faulty.
I did see in the ssh session of RMM3:
ufip=/system1/sp1/logs1/record121
Properties:
LogCreationClassName=CIM_LogRecord
LogName=IPMI SEL
CreationClassName=CIM_LogRecord
RecordID=121
MessageTimeStamp=13:56:12,January 15,1970
RecordData=System Event - OEM System Boot Event - Asserted
identity=SEL ENTRY
ufip=/system1/sp1/logs1/record123
Properties:
LogCreationClassName=CIM_LogRecord
LogName=IPMI SEL
CreationClassName=CIM_LogRecord
RecordID=123
MessageTimeStamp=13:57:59,January 15,1970
RecordData=Power Unit - Power Unit Failure detected - Asserted
identity=SEL ENTRY
ufip=/system1/sp1/logs1/record124
Properties:
LogCreationClassName=CIM_LogRecord
LogName=IPMI SEL
CreationClassName=CIM_LogRecord
RecordID=124
MessageTimeStamp=13:57:59,January 15,1970
RecordData=Power Unit - Power Off / Power Down - Deasserted
identity=SEL ENTRY
ufip=/system1/sp1/logs1/record125
Properties:
LogCreationClassName=CIM_LogRecord
LogName=IPMI SEL
CreationClassName=CIM_LogRecord
RecordID=125
MessageTimeStamp=13:57:59,January 15,1970
RecordData=Power Unit - Power Unit Failure detected - Deasserted
identity=SEL ENTRY
ufip=/system1/sp1/logs1/record126
Properties:
LogCreationClassName=CIM_LogRecord
LogName=IPMI SEL
CreationClassName=CIM_LogRecord
RecordID=126
MessageTimeStamp=13:57:59,January 15,1970
RecordData=OEM - Asserted
identity=SEL ENTRY

- Suscribirse a un feed RSS
- Marcar tema como nuevo
- Marcar tema como leído
- Flotar este Tema para el usuario actual
- Favorito
- Suscribir
- Página de impresión sencilla