Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
195 Views

Our host PC can't discover Stratix 10 MX devkit as a PCIe device, even after a successful configuration and a proper installation of Linux driver.

Jump to solution

We are trying to use Intel's DMA IP and PCIe Hard IP+ to communicate to the HBM memory. So far, we've tested AN881 design, AVMM PCIe Hard IP+ example design, and the PCIe golden design that comes with the board.

 

None of them was successful, and we can't discover our FPGA board as a PCIe device on the Linux server (host PC) that we are using.

 

the driver was compiled with GCC compiler (gcc version 4.8.5 20150623) that is used to compile our Redhat Linux kernel (3.10.0-1062.9.1.el7.x86_64)

 

We verified that the driver is installed and correctly loaded by typing "lsmod"

>> intel_fpga_pcie_drv  22470 0

 

<< Environmental info >>

Quartus version: 19.2

Design used: AN881 design example, PCIe Hard IP+ design example

FPGA power source: 8-Pin from ATX power supply unit

Configuration scheme used: JTAG via FPGA Blaster II (USB)

*FPGA board was enumerated correctly after we configure the bitstream for PCIe communication. We kept FPGA board powered on so it can retain the configuration after the host PC's reboot.

 

So far, even with the successful configuration and a proper driver installation, we couldn't discover our FPGA board as a PCIe device on the host PC.pcie slot list.PNG

The FPGA devkit is sitting on the PCIe slot 3, but it was not there when we searched PCIe devices by "lspci" command. It should be shown as "In use", not "Available." (since it means the slot is empty.)

 

Also, we can see some devices are power failing when we boot up the host PC.power failure on boot up.PNG

 

Here are the device lists that are power failing. We searched for them by bus number.device info that pwr fails.PNG

 

Also, we tested a golden design file that comes with the devkit. The design itself is working and we can verify it through the Board Test System(BTS) utility on windows. However, we can't still discover the board in the PCIe device list even with this golden design.board test tool result.PNG

 

Is there any specific way of setting FPGA board, so that it can be recognized as a proper PCIe device on the host PC?

 

Any input will be welcomed. Thank you.

0 Kudos

Accepted Solutions
Highlighted
23 Views

Hi Hsuh,

 

The initial post was mention about AN811 but now become AN881. So I just curious if you already tested the AN881.

 

  1. For the AN881 example design, it is using v19.1 pro. Did you run the Post-processing Script (section 2.2) if you re-generate the Platform designer or upgrade the design?
  2. Is your MX board consist of the following device? 1SM21BHU2F53E1VG
  3. Are you using the driver that comes with the AN881 example design?
  4. Do you have a chance to validate it on the CentOS 7.0 and see if there is any OS dependency?
  5. Could you please capture ltssmstate[5:0], currentspeed[1:0], lane_act[4:0] and link_up signal from Signaltap? This can help to confirm if the link training is up correctly.

 

 

Regards -SK

View solution in original post

0 Kudos
14 Replies
Highlighted
23 Views

Hi,

 

If you haven't done so, I would suggest you refer to the AN881 of the example design for the S10 MX board. Please refer to the following link:

 

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/an/an881.pdf

 

Regards -SK

0 Kudos
Highlighted
Beginner
23 Views

Thank you for your answer.

 

I've been suffered quite a long time with this problem. Having help from Intel is grateful and I am glad to talk to you.

 

But you should read my post at least in some important points, like environmental info. That five lines of comment have information that someone might be interested in if he really wants to help.

 

I clearly stated that I have tested the AN881 example design. I followed instructions there and it is not working even after I followed instruction exactly as the manual described. We've also consulted "Avalon-MM Intel Stratix 10 Hard IP+ for PCI Express Solutions User Guide"(UG-20170) because it contains the necessary information to install a Linux driver for PCIe communication.

 

I know Intel's new "24 hour quick response policy" that kicked in recently, but please, read before you post anything.

 

Best, HSuh.

0 Kudos
Highlighted
24 Views

Hi Hsuh,

 

The initial post was mention about AN811 but now become AN881. So I just curious if you already tested the AN881.

 

  1. For the AN881 example design, it is using v19.1 pro. Did you run the Post-processing Script (section 2.2) if you re-generate the Platform designer or upgrade the design?
  2. Is your MX board consist of the following device? 1SM21BHU2F53E1VG
  3. Are you using the driver that comes with the AN881 example design?
  4. Do you have a chance to validate it on the CentOS 7.0 and see if there is any OS dependency?
  5. Could you please capture ltssmstate[5:0], currentspeed[1:0], lane_act[4:0] and link_up signal from Signaltap? This can help to confirm if the link training is up correctly.

 

 

Regards -SK

View solution in original post

0 Kudos
Highlighted
23 Views

Besides, since power may be a concern on the host server. Could you please use the external power adapter to power up the FPGA MX development kit and see if there is any difference.

 

Regards -SK

0 Kudos
Highlighted
Beginner
23 Views

Hi, SengKokl.

 

I am sorry about my former post. I didn't know that you read my former post. I just checked my post frequently to make things clear. I thought that way a reader can grasp what's going on on my side easily.

 

Sorry for the confusion and for my attitude.

 

I will try to test your suggestions. Allow me to take some of my time because our server is located in the datacenter. To do the test you mentioned above, we need to once get the host server back to our office.

 

Meanwhile, could you tell me if there is a way to check that the settings(BIOS, physical slot placement on the motherboard, PCIe generation mode settings, etc.) on the host server is correct?

 

Since we don't know what's causing the problem right now, there is a chance that both FPGA configuration bitstream(.sof file) and the host server setting are wrong.

Currently, our FPGA is sitting on the x16 PCIe lane (gen 3.0) and we verified that the slot has full speed access to the host.

ASPM(Active State Power Management) level is disabled so that we can have full compatibility on any PCIe device.

Also, 4G decoding or Memory Mappinng of IO above 4GB is enable for the same reason. ( It allow 64bit PCI devices to be decoded above 4GB address space.)

 

Thank you,

Hsuh.

0 Kudos
Highlighted
Beginner
23 Views

This is answers to your question.

 

  1. No. Are you referring scripts that comes in the form of zip archive? We updated IPs so that it can be matched with Quartus19.2
  2. Our device variant is 1SM21BHU2F53E2VGS1 not just E1VG. This is a ES variant.
  3. Yes. It is compiled currectly and loaded before we run any test. (We load the driver again if the host is restarted)
  4. No. I can try, but our IT policy allows us to use specific Linux distribution. We are using NFS system.
  5. I will get back to you once I got the signal tap results.
0 Kudos
Highlighted
23 Views

As long as the slot of the Host PC can support Gen3X16, then it should be able to at least detect the FPGA’s PCIe interface (provided it can link up correctly). I don’t aware if any BIOS settings may impact it. The same reference design was tested here and which is working fine in one of our PC. You may try the same with different PC/server/slots to isolate if there is any dependency, and then you can also compare the BIOS setting for any difference.

 

Yes, please refer to the script.zip for the post-processing script that comes with the reference design. You need to run it to meet the requirement.

 

Regards -SK 

0 Kudos
Highlighted
Beginner
23 Views

Hi, SK.

 

I got a result from AN881 example design. I changed power supply to the one that comes with the Stratix 10, but it's still not working. Though, I think we need to use them, because when a host PC detects FPGA card, it enumerates the device list during POST and power will be reset once or twice. This power reset will erase FPGA configuration if the board is powering from the PSU of host PC.

 

Also, I tried to capture ltssmstate[5:0], currentspeed[1:0], lane_act[4:0] and link_up signals, but it was not successful. Is there any specific way to make it visible in the design? I know the original PCIe Hard IP+ has these signals, and I also turned on "Enable hard IP status bus when using AVMM interface," but the signals are still not visible.

 

So, I tried to turn on PCIe inspector by turning on PCIe instpector related options in IP parameter editor through Configuration Debug, and Extension options tab. However, it is getting errors with AN881 example design. (PFA. I attached screenshot for this.)

 

To find out what's causing the problem, I created an example design of original PCIe HARD IP+, and I got a PCIe test result from PCIe inspector. (Also attached this to the post). But I still can't see ltssm, currentspeed, etc. signals on the top level. I am experiencing these two problems.

  1. ltssmstate, currentspeed, lane_act signals are not active and not exist.
  2. signaltap node finder is not working. Even if there is signals, I can't search any node. I tested it with signals that confirmed as exist. It always says "no nodes found." I set "pcie_example_design.qsys" as a top in Quartus.

 

Is there any specific way to make those debug signals visible? Instead, I got you a result from PCIe inspector. THIS RESULT IS FOR PCIE HARD IP+'S EXAMPLE DESIGN. NOT FROM AN881.

 

Thank you.

 

Regards,pcie stat org.pngsynth error when PCIe inspector is turned on.png Han.

0 Kudos
Highlighted
23 Views

Hi,

 

The PCIe Inspector result shows that it is Gen3x4, and the LTSSM status is L0. For the LTSSM signal, it is able to find from the signal tap by this setting (see attachment).

 

Regards -SK

 

0 Kudos
Highlighted
Beginner
23 Views

I got the result. Seems like we can't set up the link in gen3 x16. These are the screenshot of Signal Tap analyzer for ltssm, link_up, currentspeed, etc.

THIS IS A SIGNALTAP INFO FOR AN881 EXAMPLE.

THE DEVICE IS ON PCIE SLOT4.

stp1.PNG

 

 

 

stp2.PNG

 

 

 

stp3.PNG

 

 

 

 

The signal keeps changing sequentially, which doesn't seem correct, or doing something meaningful inside of it. Signal sequence looks like this.

ltssmstate : 00h 08h 00h 08h 08h 00h 08h 00h 08h 00h 02h

currentspeed : 1h 1h 1h 1h 1h 1h 1h 1h 1h 1h 1h

lane_act : 10h 10h 10h 04h 10h 10h 10h 10h 10h 10h 10h

link_up : 0(always)

In this test, the FPGA board is in gen1 x16 state, but link is not setup.

 

We also tested FPGA board in other slots but it was not successful. We've tested from slot1 (direct connect to the CPU) to slot4. We couldn't test slot5 because there is no space that can fit the height of FPGA board in to the host PC's casing.

(slot5 is not a PCI slot. we have slot6 which is a PCI slot.) According to the manufacturer of the motherboard, we can use either slot2 or slot4 for full lane PCIe communiation(16 lane)

 

These are the test result from other slots. There was no transition in values like we had in slot4.

slot1 : (can't pass BIOS POST. Not bootable host PC) signaltap says ltssm(11h), currentspeed(03h), lane_act(04h), link_up(1)

slot2 : (booting of PC was successful) ltssm(11h), currentspeed(03h), lane_act(04h), link_up(1)

slot3 : (bootable) ltssm(h03), currentspeed(01h), lane_act(01h), link_up(0)

 

I have been thinking about it and I think FPGA board needs to be reset (without losing configuration) and need to kick in everything when host PC is powered up. I think PCIe IP just pass through initial handshaking status after bitstream(.sof) configuration.

 

Since FPGA board is now powered on with separate power supply(Intel's power), some sort of reset might do this. Do you have any suggestion? Or do you know what's happening on FPGA?

 

Thank you

 

Regards,

HSuh.

 

P.S. : This is our BIOS screenshot. Our host PC doesn't even see FPGA board as a PCIe device. As I said, it's currently sitting in slot4.

(This screenshot is taken after a configuration and after an enumeration.)

20200221_203922.jpg

 

 

 

 

0 Kudos
Highlighted
23 Views

Hi ,

 

The PCIe is failed to link up at slot 4. This is why the host can't detect it.

 

Here is the result that I can see from your post:

 

Slot 1: Link up with Gen3 x 4

Slot 2: Link up with Gen3 x 4

Slot 3: Failed to link up

Slot 4: Failed to link up

 

After the program the sof file to the FPGA, when the host performs a warm reboot, the perst will trigger by the host, and then it will reset the FPGA and restart the link training. When you capture the signal tap, please set the trigger condition as the rising edge of "pin_perst" signal. This can help to confirm if the FPGA gets reset properly during warm reboot.

 

Regards -SK

 

 

0 Kudos
Highlighted
Beginner
23 Views

I got the results. But I am worried about these results because it clearly shows that the design is not working properly.

 

Formerly, when I posted former result, that came from slot 1 ~ slot 4 by using Signal Tap, I thought at least slot 1 and 2 is working as you specified in your last answer.

However, It turns out that those results were not correct. When I did those tests, I reboot host PC to enumerate FPGA board, I didn't do complete power-off and on.

 

Today, while I was performing test, I powered off host PC completely and turned it on again for enumeration. I found that all 4 slots (PCIe slot 1 through 4) is not working.

Please see the attached screenshots for the results I got.

 

 

1.Right after the PC is powered on, it's in gen1 x16 state and ltssm shows 00h.

slot4_after_complete_poweroff_and_poweron.png

 

 

2. After a few second, state changes but still PCIe link is not up.

slot4_after_reboot.png

 

 

3. Also, sometimes, our PC does not boot up. It just stays on the black screen. Nothing special or different could be found in Signal Tap status when this happens (the same as above two screenshots).

The weird thing is, when I stop data acquisition in the Signal Tap, it shows ltssm = 11h, currentspeed = 03h, lane_act = 10h and link_up = 1. Which are clearly what we want to see.

This occurs randomly.

KakaoTalk_20200224_205911578.jpg

 

 

Also, I have a request for you. I have a very good reason to believe that PCIe HARD IP+ is not working in our environmental settings (Quartus 19.2 with a device variant 1SM21BHU2F53E2VGS1). When I synthesize PCIe HARD IP+, I am getting multiple timing errors in the main IP itself, not in the interconnect between modules and IPs.

 

Could you test this design, AN881, on the ES variant of Stratix 10 MX(1SM21BHU2F53E2VGS1)? Please do not test this design on the 1SM21BHU2F53E2VG (which was clearly used in the development stage only for the internal use at Intel).

 

If you could successfully run the this design, please let me know what Linux distro, Quartus version, and Linux driver was used. I will try to replicate it on my side. If you can share me the bitstream that is used for FPGA configuration(.sof file) that will be greatly helpful.

 

 

0 Kudos
Highlighted
23 Views

Hi

 

We tested the design by using 1SM21BHU2F53E1VG in the past, and it can link up properly. This is a production device that same with the device in this link: https://www.intel.com/content/www/us/en/programmable/products/boards_and_kits/dev-kits/altera/kit-s1...

 

The 1SM21BHU2F53E2VGS1 is the engineering sample device, and the board is currently not available for testing.

 

When you generate the example design by using the PCIe IP AVMM GUI, did you select "Stratix 10 MX H-Tile ES1 FPGA Development Kit"? Did you able to try it by using a different server/host?

 

Besides, you may also try to use "Avalon ST Intel Stratix 10 hard IP for PCI express" or "Avalon-MM Intel Stratix 10 Hard IP for PCIe Express" from IP catalog to generate the example design and see if it can link up as Gen3X8 as expected and detect by the host. This can help to rule out if there is a design example dependency.

 

Regards -SK

 

Regards -SK

 

0 Kudos
Highlighted
23 Views

Just to update that, we have also tested 1SM21BHU2F53E1VG with the Avalon-MM design that generated from the PCIe GUI. It is working as well. Since there is no recent activities, I will place this case to close-pending for now.  If you have further questions, please do not hesitate to get back to us within the next 20-day close-pending period

 

Regards -SK

0 Kudos