my vmWare Host halted because an NMI. The following Events appears in the Event Log:
PCIe Fat Sensor: Surprise Link Down Error on bus: 0, device: 2, function: 2. - Asserted
PCIe Cor Sensor: Receiver Error on bus: 0, device: 2, function: 2. - Asserted:
How can i find out, which Device is causing the Interrupts? I have the Raidcontroller (LSI 9361-8i) in suspicion. But when i boot the system, the BIOS from the Raidcontroller prints "Bus 6 Device 0" on the Screen?!?
System: http://www.also.com/ec/cms2/1010/ProductDetailData.do?prodId=1753179&context=ordertracking INTEL Server System P4308CP4MHGC
~ # lspci
0000:00:00.0 Bridge: Intel Corporation Ivytown DMI2 [PCIe RP[0000:00:00.0]]
0000:00:01.0 Bridge: Intel Corporation Ivytown PCI Express Root Port 1a [PCIe RP[0000:00:01.0]]
0000:00:01.1 Bridge: Intel Corporation Ivytown PCI Express Root Port 1b [PCIe RP[0000:00:01.1]]
0000:00:02.0 Bridge: Intel Corporation Ivytown PCI Express Root Port 2a [PCIe RP[0000:00:02.0]]
0000:00:02.2 Bridge: Intel Corporation Ivytown PCI Express Root Port 2c [PCIe RP[0000:00:02.2]]
0000:00:03.0 Bridge: Intel Corporation Ivytown PCI Express Root Port 3a [PCIe RP[0000:00:03.0]]
0000:00:03.2 Bridge: Intel Corporation Ivytown PCI Express Root Port 3c [PCIe RP[0000:00:03.2]]
0000:06:00.0 Mass storage controller: LSI MegaRAID SAS Invader Controller [vmhba0]
bus 0 device 2 function 2 routes to slot # 4 normally (the fourth PCIe slot, counting from the edge of the board inward).
was there any device populated there previously?
You can look up the lanes on page 47 (63 of 231) of the TPS, at http://www.intel.com/support/motherboards/server/s2600cp/sb/CS-033128.htm Intel� Server Board S2600CP Family — Technical Product Specification , if the reported bus number ever changes after POST.
Thank you for the link, Dan_O. In Slot 4 was the Raid Controller populated.
In the meantime i have replaced the Raid Controller with a new one (i put it this time in Slot 6). Then i updated the Firmware of the mainboard. During the ME Update the system stops again due to NMI Interrupt (same NMI Message, but this time Slot 6, where the Raid Controller is now plugged).
What I noticed: each interrupt appeared as the system fans turn up (for example during the me update or when i changed the system accoustic settings in the bios to "Performance")?!?
What I can do?
So, two things: one, it is normal that the fans ramp up when there is an error. two, if you pull the RAID card, can you update the firmware (including ME) with no errors or interrupts? If you can, do that first, then after it's done, shut down, pull AC power for 20 seconds, then boot into the BIOS and do an F9 to restore defaults. After doing that, shut down and pull AC again, put the RAID card back in, and boot up and update the RAID card firmware by itself.