Server Products
Data Center Products including boards, integrated systems, Intel® Xeon® Processors, RAID Storage, and Intel® Xeon® Processors
4784 Discussions

P4304BTLSFCN reboots randomly...

CCosp
Beginner
1,583 Views

I have been battling with a crazy problem as of late (several months now). A brand new P4304BTLSFCN server running an E3-1240 processor and 16GB of Crucial memory with just Windows Server 2008 R2 installed and updated reboots randomly. I have updated all firmwares via EFI (BIOS, FRU/SDR, BMC), updated all drivers to the latest and the problem keeps occuring. I've tried various BIOS settings such as enabling the NIC ROM that I have read on another thread but that didn't seem to have any effect. I replaced the processor, motherboard, PSU, disconnected Intel RMM4 and the still the problem persist (all these things were replaced over a month period and tested one at a time). Tonight I discovered the SEL viewer. O opened it and matched up an event to the reboot events in the Windows Event viewer. The SEL viewer shows "OEM Reserved,SMI TimeOut (# 0x06)" "CRITICAL event: SMI TimeOut reports it has been asserted." "Integrated BMC - LUN# 0 (Channel# 0)". I am unsure what this means but its in red in the SEL viewer and happens right before the server logs an unclean shutdown. If anyone could help me with this error I would be greatly appreciated. I would attach the SEL log file for reference but I don't see an option to do that here.

0 Kudos
2 Replies
Daniel_O_Intel
Employee
358 Views

I don't see anything immediately obvious. After the firmware update, have you done a BIOS reset to factory defaults, and pulled the AC for twenty seconds? That can clear up some of the weirder problems.

You can also try dropping down to just one DIMM, and run that as a test.

How often are you getting the reboot? Once a day, or once a week, on average?

0 Kudos
CCosp
Beginner
358 Views

Yes, I have reset the BIOS to factory defaults a few time and removed power. The reboot occurs at various times which makes it hard to figure out. I did a test a few weeks back right after swapping the motherboard. I created a script to copy a 800GB file to the RAID 1 array, delete and then start over while at the same time running prime95 and loading the CPU to 100%. It ran this script for over a week without rebooting. I thought that perhaps the problem was fixed with the motherboard swap, but after disabling the script and stopping the CPU load on a Friday afternoon, the server rebooted twice over the weekend. I starting thinking about it and checked the properties of the onboard NICs. I disabled all the power efficiency, low power, reduce link speed in standby, etc. and since then the server has not rebooted. What got me to look in this direction was the fact that when copying files across the network continously the server did not reboot for over a week. Anyway, it's been 9 days since disabling those options and so far no reboot. I'm really hoping this solves the issue but time will tell. BTW, I am using Intel Proset 17.4.95.0 in case anyone else has this problem. I will post back here after more testing to confirm this solved the problem. If anyone else has seen this or has any more ideas, please post and let me know.

0 Kudos
Reply