Community
cancel
Showing results for 
Search instead for 
Did you mean: 
samir_bhansali
Beginner
1,013 Views

Intel BD-NVV-N3000-2 issue

After installing new FPGA card on PCIe x16 Slot, when we try to enter in BIOS server is halting in BIOS with error "Nmi activated - system halted".

Note : It’s booting to OS without any issue.

Server Product Code : LWF2208IR848505
Server S/N : BQF900200624
FPGA Model : BD-NVV-N3000-2
FPGA S/N : 644C36123B68
Intel BIOS : 02.01.0012

OS : CentOS 7.7

Please find attached lspci output.

Need your help to resolve this issue, can you please escalate this issue to Intel core development team. We are assuming this is not an hardware failure issue, it could be due to compatibility issue or new modified/beta BIOS can resolve this issue.

0 Kudos
33 Replies
JonWay_C_Intel
Employee
315 Views

Hi, I have sent you a private message.

samir_bhansali
Beginner
304 Views

Hi Jonway,

I have tried same FPGA card in Supermicro 7049GP-TRT server and observed FPGA is not detecting. Please find attached "lspci" command output.

samir_bhansali
Beginner
294 Views

HI Jonway,

Have you any solution on this.

JonWay_C_Intel
Employee
289 Views

Hi @samir_bhansali 

1) If you suspect, it is a card issue, can you try with a different card?

2) If it is the same on all cards, can you let me know if lspci not detect card happens immediately right after boot up

OR

it was detected at first but failed later on?

 

Also, is the LED at blinking at 1 sec interval. If yes, happens immediately right after boot up or failed later on?

samir_bhansali
Beginner
277 Views

Hi Jonway,

I have checked again Intel FPGA card with Supermicro SYS-7049GP-TRT server and found, after powering on the server there is no led glowing on FPGA led. After booting to OS led blinking status is 1,3 blinking green and 2,4 blinking yellow.

Immediately after booting to OS and also later on, we tried with "lspci" command and now the card is showing in output list.

Please find attached output for your reference.

Kindly check attachment and let us know whether card is detecting properly or not.

Please suggest what could be the issue with Intel server for FPGA card detection.

JonWay_C_Intel
Employee
252 Views

HI @samir_bhansali 

Based on the supermicro lspci output, it shows the card is detected properly.

60:00.0 Ethernet controller: Intel Corporation Ethernet Controller XXV710 Intel(R) FPGA Programmable Acceleration Card N3000 for Networking (rev 02)
60:00.1 Ethernet controller: Intel Corporation Ethernet Controller XXV710 Intel(R) FPGA Programmable Acceleration Card N3000 for Networking (rev 02)
61:00.0 Processing accelerators: Intel Corporation Device 0b30
62:00.0 Ethernet controller: Intel Corporation Ethernet Controller XXV710 Intel(R) FPGA Programmable Acceleration Card N3000 for Networking (rev 02)
62:00.1 Ethernet controller: Intel Corporation Ethernet Controller XXV710 Intel(R) FPGA Programmable Acceleration Card N3000 for Networking (rev 02)
63:00.0 Processing accelerators: Intel Corporation Device 0b32

JonWay_C_Intel
Employee
249 Views

Given that the Supermicro works and the Wolfpass server does not. It is most probably that you need the Airduct recommended by the server, which should be used with the card: https://ark.intel.com/content/www/us/en/ark/products/125929/passive-airduct-kit-awfcoproductad.html

 

samir_bhansali
Beginner
232 Views

Hi Jonway,

Below suggested accessories is already installed in Intel LWF2208IR848505 server.

Accessory: AWFCOPRODUCTBKT: High Air flow Air Duct Bracket Kit & A2UL16RISER2: 2-slot PCIe* Riser card.

Please suggest else what could be the issue in intel server.

JonWay_C_Intel
Employee
225 Views

1. Confirm that you are running at maximum fan speed

2. Does the card shutdown even at idling? if no, you may want to check with your workload developer how to reduce the workload.

 

samir_bhansali
Beginner
214 Views

Hi Jonway,

Fans are running on normal mode.

Yes server is in idle mode.

samir_bhansali
Beginner
199 Views

Hi Jonway,

Below is the attached output from supermicro server when immediately run the command after booted to OS for your reference.

But after OS booted later on the output is different. 

JonWay_C_Intel
Employee
192 Views

I think this further confirmed that you dont have enough airflow.

When you first booted into OS, the card is still working but very near to shutdown temperature.

(12) FPGA Die Temperature : 96.50 Celsius

After a while, when the temperature exceed 100C, it will shutdown, and you wont get any reading.

 

The card is fine and working as expected. The problem is that you dont have enough airflow to the card. You will need to check with the server vendor on how to improve airflow to the pcie slots devices.

ahaa
Beginner
119 Views

some questions:

1. how to confirm that you are running at maximum fan speed or modify it?

2. the fan referred to here is the fan of pac board or the fan or server machine?

thanks very much

JonWay_C_Intel
Employee
108 Views

hi @ahaa 

There is no fan on the pac card. It is passively cooled.

Some server vendors fan speed is changed in the BIOS and some are changed via the server BMC (maybe there are more other ways), you will need to check with the server vendor to know what speed you are running at, and how to change it.