Server Products
Data Center Products including boards, integrated systems, Intel® Xeon® Processors, RAID Storage, and Intel® Xeon® Processors
4778 Discussions

NMI S2600CP - find Device

CWoll1
Beginner
2,068 Views

Hi all,

my vmWare Host halted because an NMI. The following Events appears in the Event Log:

PCIe Fat Sensor: Surprise Link Down Error on bus: 0, device: 2, function: 2. - Asserted

PCIe Cor Sensor: Receiver Error on bus: 0, device: 2, function: 2. - Asserted:

How can i find out, which Device is causing the Interrupts? I have the Raidcontroller (LSI 9361-8i) in suspicion. But when i boot the system, the BIOS from the Raidcontroller prints "Bus 6 Device 0" on the Screen?!?

System: http://www.also.com/ec/cms2/1010/ProductDetailData.do?prodId=1753179&context=ordertracking INTEL Server System P4308CP4MHGC

Mainboard:S2600CP4

Thank you!

Edit:

~ # lspci

0000:00:00.0 Bridge: Intel Corporation Ivytown DMI2 [PCIe RP[0000:00:00.0]]

0000:00:01.0 Bridge: Intel Corporation Ivytown PCI Express Root Port 1a [PCIe RP[0000:00:01.0]]

0000:00:01.1 Bridge: Intel Corporation Ivytown PCI Express Root Port 1b [PCIe RP[0000:00:01.1]]

0000:00:02.0 Bridge: Intel Corporation Ivytown PCI Express Root Port 2a [PCIe RP[0000:00:02.0]]

0000:00:02.2 Bridge: Intel Corporation Ivytown PCI Express Root Port 2c [PCIe RP[0000:00:02.2]]

0000:00:03.0 Bridge: Intel Corporation Ivytown PCI Express Root Port 3a [PCIe RP[0000:00:03.0]]

0000:00:03.2 Bridge: Intel Corporation Ivytown PCI Express Root Port 3c [PCIe RP[0000:00:03.2]]

...

0000:06:00.0 Mass storage controller: LSI MegaRAID SAS Invader Controller [vmhba0]

0 Kudos
3 Replies
Daniel_O_Intel
Employee
770 Views

bus 0 device 2 function 2 routes to slot # 4 normally (the fourth PCIe slot, counting from the edge of the board inward).

was there any device populated there previously?

You can look up the lanes on page 47 (63 of 231) of the TPS, at http://www.intel.com/support/motherboards/server/s2600cp/sb/CS-033128.htm Intel� Server Board S2600CP Family — Technical Product Specification , if the reported bus number ever changes after POST.

CWoll1
Beginner
770 Views

Thank you for the link, Dan_O. In Slot 4 was the Raid Controller populated.

In the meantime i have replaced the Raid Controller with a new one (i put it this time in Slot 6). Then i updated the Firmware of the mainboard. During the ME Update the system stops again due to NMI Interrupt (same NMI Message, but this time Slot 6, where the Raid Controller is now plugged).

What I noticed: each interrupt appeared as the system fans turn up (for example during the me update or when i changed the system accoustic settings in the bios to "Performance")?!?

What I can do?

Thanks all!

0 Kudos
Daniel_O_Intel
Employee
770 Views

So, two things: one, it is normal that the fans ramp up when there is an error. two, if you pull the RAID card, can you update the firmware (including ME) with no errors or interrupts? If you can, do that first, then after it's done, shut down, pull AC power for 20 seconds, then boot into the BIOS and do an F9 to restore defaults. After doing that, shut down and pull AC again, put the RAID card back in, and boot up and update the RAID card firmware by itself.

0 Kudos
Reply