After installing new FPGA card on PCIe x16 Slot, when we try to enter in BIOS server is halting in BIOS with error "Nmi activated - system halted".
Note : It’s booting to OS without any issue.
Server Product Code : LWF2208IR848505
Server S/N : BQF900200624
FPGA Model : BD-NVV-N3000-2
FPGA S/N : 644C36123B68
Intel BIOS : 02.01.0012
OS : CentOS 7.7
Please find attached lspci output.
Need your help to resolve this issue, can you please escalate this issue to Intel core development team. We are assuming this is not an hardware failure issue, it could be due to compatibility issue or new modified/beta BIOS can resolve this issue.
1) If you suspect, it is a card issue, can you try with a different card?
2) If it is the same on all cards, can you let me know if lspci not detect card happens immediately right after boot up
it was detected at first but failed later on?
Also, is the LED at blinking at 1 sec interval. If yes, happens immediately right after boot up or failed later on?
I have checked again Intel FPGA card with Supermicro SYS-7049GP-TRT server and found, after powering on the server there is no led glowing on FPGA led. After booting to OS led blinking status is 1,3 blinking green and 2,4 blinking yellow.
Immediately after booting to OS and also later on, we tried with "lspci" command and now the card is showing in output list.
Please find attached output for your reference.
Kindly check attachment and let us know whether card is detecting properly or not.
Please suggest what could be the issue with Intel server for FPGA card detection.
Based on the supermicro lspci output, it shows the card is detected properly.
60:00.0 Ethernet controller: Intel Corporation Ethernet Controller XXV710 Intel(R) FPGA Programmable Acceleration Card N3000 for Networking (rev 02)
60:00.1 Ethernet controller: Intel Corporation Ethernet Controller XXV710 Intel(R) FPGA Programmable Acceleration Card N3000 for Networking (rev 02)
61:00.0 Processing accelerators: Intel Corporation Device 0b30
62:00.0 Ethernet controller: Intel Corporation Ethernet Controller XXV710 Intel(R) FPGA Programmable Acceleration Card N3000 for Networking (rev 02)
62:00.1 Ethernet controller: Intel Corporation Ethernet Controller XXV710 Intel(R) FPGA Programmable Acceleration Card N3000 for Networking (rev 02)
63:00.0 Processing accelerators: Intel Corporation Device 0b32
Given that the Supermicro works and the Wolfpass server does not. It is most probably that you need the Airduct recommended by the server, which should be used with the card: https://ark.intel.com/content/www/us/en/ark/products/125929/passive-airduct-kit-awfcoproductad.html
Below suggested accessories is already installed in Intel LWF2208IR848505 server.
Accessory: AWFCOPRODUCTBKT: High Air flow Air Duct Bracket Kit & A2UL16RISER2: 2-slot PCIe* Riser card.
Please suggest else what could be the issue in intel server.
1. Confirm that you are running at maximum fan speed
2. Does the card shutdown even at idling? if no, you may want to check with your workload developer how to reduce the workload.
I think this further confirmed that you dont have enough airflow.
When you first booted into OS, the card is still working but very near to shutdown temperature.
(12) FPGA Die Temperature : 96.50 Celsius
After a while, when the temperature exceed 100C, it will shutdown, and you wont get any reading.
The card is fine and working as expected. The problem is that you dont have enough airflow to the card. You will need to check with the server vendor on how to improve airflow to the pcie slots devices.
1. how to confirm that you are running at maximum fan speed or modify it?
2. the fan referred to here is the fan of pac board or the fan or server machine?
thanks very much
There is no fan on the pac card. It is passively cooled.
Some server vendors fan speed is changed in the BIOS and some are changed via the server BMC (maybe there are more other ways), you will need to check with the server vendor to know what speed you are running at, and how to change it.