- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I’ve installed an Intel N6000/1-PL SmartNIC on a Lenovo SR650v2 server with the following stack:
- N6000 SKU1
- CentOS Stream release 8
- OPAE v2.1.1
- kernel 5.15.92-dfl
Server BIOS settings: card tested on two slots (1 and 7) with PCIe bifurcation set to x8x8. Fan speed set to maximum.
The server BIOS reports the following warning:
PCIe error recovery has occurred in slot number 1. The adapter may not work correctly.
And dmesg contains:
[22638.864360] intel-m10bmc-sec-update n6000bmc-sec-update.3.auto: SDM trigger failure: 4
[22638.877250] dfl-pci 0000:c5:00.1: enabling device (0140 -> 0142)
[22638.877568] dfl-pci 0000:c5:00.1: PCIE AER unavailable -5.
[22638.890287] dfl-pci 0000:c5:00.2: enabling device (0140 -> 0142)
[22638.890607] dfl-pci 0000:c5:00.2: PCIE AER unavailable -5.
[22638.904091] dfl-pci 0000:c5:00.3: enabling device (0140 -> 0142)
[22638.904377] dfl-pci 0000:c5:00.3: PCIE AER unavailable -5.
[22638.916944] dfl-pci 0000:c5:00.4: enabling device (0140 -> 0142)
[22638.917231] dfl-pci 0000:c5:00.4: PCIE AER unavailable -5.
Trying to deploy an image results in the error included below.
Otherwise PCIe inventory and fpgainfo command seem to work ok as shown below.
Any help would be appreciated. Hardware problem, on-card BMC problem, software problem ?
fpgasupdate --log-level debug ofs_top_page1_pacsign_user1.bin 0000:C5:00.0
[2024-01-29 05:07:27.46] [DEBUG ] fw file: ofs_top_page1_pacsign_user1.bin
[2024-01-29 05:07:27.46] [DEBUG ] addr: 0000:C5:00.0
[2024-01-29 05:07:27.46] [DEBUG ] hash256: b'e026976389252b8a746943f351e8f149e5f0415f620cd1e0618229eb79e01bb8'
[2024-01-29 05:07:27.46] [DEBUG ] hash384: b'bb04ea12557ce23f2cb75685669d794fb6a06bf7b590430aa8bfdb4c765c6e15ecdb38200e1599aa8a7e52a2958e20db'
[2024-01-29 05:07:27.46] [DEBUG ] file type: Static Region (Update)
[2024-01-29 05:07:27.47] [DEBUG ] found device at 0000:c5:00.3 -tree is
[pci_address(0000:c2:04.0), pci_id(0x8086, 0x347c)] (pcieport)
[pci_address(0000:c5:00.3), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.1), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.4), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.2), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.0), pci_id(0x8086, 0xbcce)] (dfl-pci)
[2024-01-29 05:07:27.47] [DEBUG ] found device at 0000:c5:00.1 -tree is
[pci_address(0000:c2:04.0), pci_id(0x8086, 0x347c)] (pcieport)
[pci_address(0000:c5:00.3), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.1), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.4), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.2), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.0), pci_id(0x8086, 0xbcce)] (dfl-pci)
[2024-01-29 05:07:27.47] [DEBUG ] found device at 0000:c5:00.0 -tree is
[pci_address(0000:c2:04.0), pci_id(0x8086, 0x347c)] (pcieport)
[pci_address(0000:c5:00.3), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.1), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.4), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.2), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.0), pci_id(0x8086, 0xbcce)] (dfl-pci)
[2024-01-29 05:07:27.47] [DEBUG ] found device at 0000:c5:00.4 -tree is
[pci_address(0000:c2:04.0), pci_id(0x8086, 0x347c)] (pcieport)
[pci_address(0000:c5:00.3), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.1), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.4), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.2), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.0), pci_id(0x8086, 0xbcce)] (dfl-pci)
[2024-01-29 05:07:27.47] [DEBUG ] found device at 0000:c5:00.2 -tree is
[pci_address(0000:c2:04.0), pci_id(0x8086, 0x347c)] (pcieport)
[pci_address(0000:c5:00.3), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.1), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.4), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.2), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.0), pci_id(0x8086, 0xbcce)] (dfl-pci)
[2024-01-29 05:07:27.47] [DEBUG ] found device at 0000:c5:00.0 -tree is
[pci_address(0000:c2:04.0), pci_id(0x8086, 0x347c)] (pcieport)
[pci_address(0000:c5:00.3), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.1), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.4), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.2), pci_id(0x8086, 0xbcce)] (dfl-pci)
[pci_address(0000:c5:00.0), pci_id(0x8086, 0xbcce)] (dfl-pci)
[2024-01-29 05:07:27.48] [DEBUG ] could not find: "/sys/class/fpga_region/region0/dfl-fme.0/dfl*.*/*spi*/spi_master/spi*/spi*"
[2024-01-29 05:07:27.48] [DEBUG ] could not find: "/sys/class/fpga_region/region0/dfl-fme.0/dfl*.*/spi_master/spi*/spi*"
[2024-01-29 05:07:27.48] [DEBUG ] could not find: "/sys/class/fpga_region/region0/dfl-fme.0/spi*/spi_master/spi*/spi*"
[2024-01-29 05:07:27.48] [DEBUG ] could not find: "/sys/class/fpga_region/region0/dfl-fme.0/dfl_dev.4/n6000bmc-sec-update.3.auto/*fpga_sec_mgr*/*fpga_sec*"
[2024-01-29 05:07:27.48] [DEBUG ] could not find: "/sys/class/fpga_region/region0/dfl-fme.0/dfl_dev.4/n6000bmc-sec-update.3.auto/fpga_image_load/fpga_image*"
Traceback (most recent call last):
File "/usr/bin/fpgasupdate", line 33, in <module>
sys.exit(load_entry_point('opae.admin===1.4.1-', 'console_scripts', 'fpgasupdate')())
File "/usr/lib/python3.6/site-packages/opae/admin/tools/fpgasupdate.py", line 789, in main
if pac.upload_dev.find_one(os.path.join('update', 'filename')):
AttributeError: 'NoneType' object has no attribute 'find_one'
lspci -vt
| +-02.0-[c3-c4]--+-00.0 Intel Corporation Ethernet Controller E810-C for backplane
| | +-00.1 Intel Corporation Ethernet Controller E810-C for backplane
| | +-00.2 Intel Corporation Ethernet Controller E810-C for backplane
| | +-00.3 Intel Corporation Ethernet Controller E810-C for backplane
| | +-00.4 Intel Corporation Ethernet Controller E810-C for backplane
| | +-00.5 Intel Corporation Ethernet Controller E810-C for backplane
| | +-00.6 Intel Corporation Ethernet Controller E810-C for backplane
| | \-00.7 Intel Corporation Ethernet Controller E810-C for backplane
| \-04.0-[c5]--+-00.0 Intel Corporation Device bcce
| +-00.1 Intel Corporation Device bcce
| +-00.2 Intel Corporation Device bcce
| +-00.3 Intel Corporation Device bcce
| \-00.4 Intel Corporation Device bcce
fpgainfo fme
Intel Acceleration Development Platform N6001
Board Management Controller NIOS FW version: 3.14.0
Board Management Controller Build version: 3.14.0
//****** FME ******//
Object Id : 0xEF00000
PCIe s:b:d.f : 0000:C5:00.0
Vendor Id : 0x8086
Device Id : 0xBCCE
SubVendor Id : 0x8086
SubDevice Id : 0x1771
Socket Id : 0x00
Ports Num : 01
Bitstream Id : 0x5010202FAB46E6A
Bitstream Version : 5.0.1
Pr Interface Id : 00bc56cf-9e1f-5bf0-8011-48736ec862c9
Boot Page : user1
Factory Image Info : 801148736ec862c900bc56cf9e1f5bf0
User1 Image Info : 801148736ec862c900bc56cf9e1f5bf0
User2 Image Info : 801148736ec862c900bc56cf9e1f5bf0
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Frederic,
Sorry for the delay in replying to your post. Just a few questions
1) Does the fpga card working (eg running afu test or your program) after your see the error and done all the commands that you listed (lspci, fpgainfo fme)?
2) Did you do any prior flashing on the FPGA card before rebooting?
3) Does the issue happen 1 time only or every reboot also you see the issue "PCIe error recovery has occurred in slot number 1. The adapter may not work correctly."
It might be due to this intel-m10bmc-sec-update n6000bmc-sec-update.3.auto: SDM trigger failure: 4
If flashing SDM firmware , what I saw in our engineering database is that :
SDM provision firmware downloading requires Power Cycle, (This is SDM requirement).
Once SDM provisioning firmware download and key provisioning is done then we need to do power cycle.
Thanks
Regards
Kian
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kian,
1) chicken and egg problem: since I cannot deploy any image on the board, I haven't be able to test it with any program (my end goal is to use Intel P4 SDK with this card)
2) the only thing that I flashed on the card is a more recent BMC firmware (using a USB/jtag cable). The initial version was 3.1. I upgraded it to 3.14 but to no avail:
Board Management Controller NIOS FW version: 3.14.0
Board Management Controller Build version: 3.14.0
3) the problem occurs systematically after any number of (cold) reboot
I'm not aware of the SDM firmware. Is it distinct from the BMC firmware ?
Regards
Frederic
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Frederic ,
Thanks for the reply, so basically the fpga board does not have any image in it yet other than the BMC firmware on max10.
I was trying to find which version is associated with Pr Interface Id : 00bc56cf-9e1f-5bf0-8011-48736ec862c9
Anyway, I discuss with my colleague over here on this issue, we should focus on why fpgasupdate fail with missing files. I were thinking because the card is non functional without valid image , it is triggering the SDM (secure device manager) to try reconfigure the fpga and fail. It is a separate firmware from BMC but have some interface with it.
Do you know the OFS version that you installed in your system, I only saw OPAE is 2.1.1 but dfl version unknown except you are running kernel 5.15.92) and also the Quartus version that is installed in your system?
Could you try using Quartus to program/flash the fpga and see whether the fpga is working?
Thanks
Regards
Kian
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kian,
Regarding OFS version, I did not use an OFS installer script. I compiled the kernel using this branch of the linux-dfl project:
git clone https://github.com/OPAE/linux-dfl.git -b fpga-ofs-dev-5.15-lts
Quartus version is Version 22.1.0 Build 174 03/30/2022 SC Pro Edition.
My knowledge of Quartus (and low-level FPGA programming) being limited, I'm afraid I won't be able to program the card using Quartus unless a ready-to-use project is available.
Regards,
Frederic
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Frederic,
Sorry for the delay in replying, trying to setup a server on my end to test out the configuration on my side.
Do you mind to provide the file that you tried to flash in via this command "fpgasupdate --log-level debug ofs_top_page1_pacsign_user1.bin 0000:C5:00.0" ?
I will try it on my end and see whether I could see the same thing
Thanks
Regards
Kian
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kian,
Please find the image as attachement.
Regards,
Frederic
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Frederic ,
I've setup similar system running the same OPAE and DFL with yours, and tried the fpgasupdate command . I could see the same error as you so I will debug this on my end.
The error is not related to bin file you provided, used the release bin also similar result.
Does upgrading the OFS version for both OPAE & DFL possible for you? If yes, let me try out the new version first.
Thanks
Regards
Kian
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kian,
Upgrading OPAE & DFL version is not possible because I need to stick to versions compatible with the P4 toolchain.
Actually, the linux-dfl version I've used so far (5.15.92-dfl) has a minor revision (92) more recent than the revision I was advised to use.
In doubt, I downgraded the kernel to 5.15.45-dfl, reinstalled OPAE 2.1.1 and ... the problem goes away ! fpgasupdate completes successfully.
Regards,
Frederic
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Frederic ,
That's good to hear . I upgraded the DFL and OPAE version to the 2023.3-2 OFS and fpgasupdate is also working. I was trying to find those missing files reported in the logs but couldn't find it , previous I was on 5.15.92-dfl as well. Probably there is some issues with that particular version as I remembered I tested OFS 2022 version and it was working previously 1 year + back.
Anyway thanks for the update. Does the error still pops up?
Quote:
"The server BIOS reports the following warning:
PCIe error recovery has occurred in slot number 1. The adapter may not work correctly.
And dmesg contains:
[22638.864360] intel-m10bmc-sec-update n6000bmc-sec-update.3.auto: SDM trigger failure: 4"
Thanks
Regards
Kian
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kian,
The BIOS warning is gone as well as the kernel error message related to the N6000.
Regards,
Frederic
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Frederic,
Thanks for the info, is there anything else I could support you with? Otherwise I would like to close the forum case as resolved and transition it to community support.
Thanks
Regards
Kian
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kian,
As far as I'm concerned, the problem is solved and the case can be closed.
Regards,
Frederic
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page