- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
We're trying to install the BD-NVV-N3000-3 card on our Dell R720 machine.
We installed the card to one of x16 PCIe slots and connected it with a 6-pin power connector.
The system can boot with the card installed and it shows up in lspci output, but after a very short time - usually a matter of minutes - the system crashes with this log:
h11 login: [ 174.287195] {1}[Hardware Error]: Hardware error from APEI
Generic Hardware Error Source: 32992
[ 174.296815] {1}[Hardware Error]: event severity: fatal
[ 174.302550] {1}[Hardware Error]: Error 0, type: fatal
[ 174.308285] {1}[Hardware Error]: section_type: PCIe error
[ 174.314504] {1}[Hardware Error]: port_type: 4, root port
[ 174.320625] {1}[Hardware Error]: version: 1.0
[ 174.325682] {1}[Hardware Error]: command: 0x0547, status: 0x4010
[ 174.332579] {1}[Hardware Error]: device_id: 0000:40:02.0
[ 174.338698] {1}[Hardware Error]: slot: 0
[ 174.343266] {1}[Hardware Error]: secondary_bus: 0x42
[ 174.348999] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x0e04
[ 174.356381] {1}[Hardware Error]: class_code: 000406
[ 174.362018] {1}[Hardware Error]: bridge: secondary_status: 0x2000,
control: 0x0003
[ 174.370662] Kernel panic - not syncing: Fatal hardware error!
[ 174.377153] Kernel Offset: 0x3d600000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 174.391984] Rebooting in 30 seconds..
We noticed that this can happen even when the kernel is not running, like if you're sitting at a grub prompt or even during the UEFI pre-boot environment (see attached log).
We tried the card in each of the x16 slots and also tried removing the other adapters but to no avail.
Does anyone know why this happens?
Please let me know if you need any further information.
Thanks,
Daehyeok
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When issue happens, if the LED on the backpanel are blinking 1 sec. It is most likely you do not have enough airflow to cool the card. Try setting the server fan to maximum.
Usually, the card will be running at first, but later as the temperature increases, the board will shutdown to prevent overheating. You can run "watch -d n 1 fpgainfo bmc".
Monitor FPGA Die Temperature. If keeps increasing and surpass 100C, it will shutdown.
If you do not want the shutdown to cause a kernel panic, you can activate the daemon to perform Graceful Shutdown. Steps as in: https://www.intel.com/content/www/us/en/programmable/documentation/xgz1560360700260.html#zqb1564607955079
Nevertheless, you still need to power cycle the server to bring the card back from shutdown state.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When issue happens, if the LED on the backpanel are blinking 1 sec. It is most likely you do not have enough airflow to cool the card. Try setting the server fan to maximum.
Usually, the card will be running at first, but later as the temperature increases, the board will shutdown to prevent overheating. You can run "watch -d n 1 fpgainfo bmc".
Monitor FPGA Die Temperature. If keeps increasing and surpass 100C, it will shutdown.
If you do not want the shutdown to cause a kernel panic, you can activate the daemon to perform Graceful Shutdown. Steps as in: https://www.intel.com/content/www/us/en/programmable/documentation/xgz1560360700260.html#zqb1564607955079
Nevertheless, you still need to power cycle the server to bring the card back from shutdown state.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your reply.
As you said, the high temperature was the issue, and we were able to resolve it.
Thanks,
Daehyeok

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page