Application Acceleration With FPGAs
Programmable Acceleration Cards (PACs), DCP, FPGA AI Suite, Software Stack, and Reference Designs
489 Discussions

Intel BD-NVV-N3000-2 issue

samir_bhansali
Beginner
10,927 Views

After installing new FPGA card on PCIe x16 Slot, when we try to enter in BIOS server is halting in BIOS with error "Nmi activated - system halted".

Note : It’s booting to OS without any issue.

Server Product Code : LWF2208IR848505
Server S/N : BQF900200624
FPGA Model : BD-NVV-N3000-2
FPGA S/N : 644C36123B68
Intel BIOS : 02.01.0012

OS : CentOS 7.7

Please find attached lspci output.

Need your help to resolve this issue, can you please escalate this issue to Intel core development team. We are assuming this is not an hardware failure issue, it could be due to compatibility issue or new modified/beta BIOS can resolve this issue.

0 Kudos
33 Replies
samir_bhansali
Beginner
3,425 Views

Hi Jonway,

I have tried same FPGA card in Supermicro 7049GP-TRT server and observed FPGA is not detecting. Please find attached "lspci" command output.

0 Kudos
samir_bhansali
Beginner
3,415 Views

HI Jonway,

Have you any solution on this.

0 Kudos
JonWay_C_Intel
Employee
3,410 Views

Hi @samir_bhansali 

1) If you suspect, it is a card issue, can you try with a different card?

2) If it is the same on all cards, can you let me know if lspci not detect card happens immediately right after boot up

OR

it was detected at first but failed later on?

 

Also, is the LED at blinking at 1 sec interval. If yes, happens immediately right after boot up or failed later on?

0 Kudos
samir_bhansali
Beginner
3,398 Views

Hi Jonway,

I have checked again Intel FPGA card with Supermicro SYS-7049GP-TRT server and found, after powering on the server there is no led glowing on FPGA led. After booting to OS led blinking status is 1,3 blinking green and 2,4 blinking yellow.

Immediately after booting to OS and also later on, we tried with "lspci" command and now the card is showing in output list.

Please find attached output for your reference.

Kindly check attachment and let us know whether card is detecting properly or not.

Please suggest what could be the issue with Intel server for FPGA card detection.

0 Kudos
JonWay_C_Intel
Employee
3,373 Views

HI @samir_bhansali 

Based on the supermicro lspci output, it shows the card is detected properly.

60:00.0 Ethernet controller: Intel Corporation Ethernet Controller XXV710 Intel(R) FPGA Programmable Acceleration Card N3000 for Networking (rev 02)
60:00.1 Ethernet controller: Intel Corporation Ethernet Controller XXV710 Intel(R) FPGA Programmable Acceleration Card N3000 for Networking (rev 02)
61:00.0 Processing accelerators: Intel Corporation Device 0b30
62:00.0 Ethernet controller: Intel Corporation Ethernet Controller XXV710 Intel(R) FPGA Programmable Acceleration Card N3000 for Networking (rev 02)
62:00.1 Ethernet controller: Intel Corporation Ethernet Controller XXV710 Intel(R) FPGA Programmable Acceleration Card N3000 for Networking (rev 02)
63:00.0 Processing accelerators: Intel Corporation Device 0b32

0 Kudos
JonWay_C_Intel
Employee
3,370 Views

Given that the Supermicro works and the Wolfpass server does not. It is most probably that you need the Airduct recommended by the server, which should be used with the card: https://ark.intel.com/content/www/us/en/ark/products/125929/passive-airduct-kit-awfcoproductad.html

 

0 Kudos
samir_bhansali
Beginner
3,353 Views

Hi Jonway,

Below suggested accessories is already installed in Intel LWF2208IR848505 server.

Accessory: AWFCOPRODUCTBKT: High Air flow Air Duct Bracket Kit & A2UL16RISER2: 2-slot PCIe* Riser card.

Please suggest else what could be the issue in intel server.

0 Kudos
JonWay_C_Intel
Employee
3,346 Views

1. Confirm that you are running at maximum fan speed

2. Does the card shutdown even at idling? if no, you may want to check with your workload developer how to reduce the workload.

 

0 Kudos
samir_bhansali
Beginner
3,335 Views

Hi Jonway,

Fans are running on normal mode.

Yes server is in idle mode.

0 Kudos
samir_bhansali
Beginner
3,320 Views

Hi Jonway,

Below is the attached output from supermicro server when immediately run the command after booted to OS for your reference.

But after OS booted later on the output is different. 

0 Kudos
JonWay_C_Intel
Employee
3,313 Views

I think this further confirmed that you dont have enough airflow.

When you first booted into OS, the card is still working but very near to shutdown temperature.

(12) FPGA Die Temperature : 96.50 Celsius

After a while, when the temperature exceed 100C, it will shutdown, and you wont get any reading.

 

The card is fine and working as expected. The problem is that you dont have enough airflow to the card. You will need to check with the server vendor on how to improve airflow to the pcie slots devices.

0 Kudos
ahaa
Beginner
3,240 Views

some questions:

1. how to confirm that you are running at maximum fan speed or modify it?

2. the fan referred to here is the fan of pac board or the fan or server machine?

thanks very much

0 Kudos
JonWay_C_Intel
Employee
3,229 Views

hi @ahaa 

There is no fan on the pac card. It is passively cooled.

Some server vendors fan speed is changed in the BIOS and some are changed via the server BMC (maybe there are more other ways), you will need to check with the server vendor to know what speed you are running at, and how to change it.

0 Kudos
Reply