Re: Arria 10 PCIe Retraining with LMI with Configuration Space Bypass Enabled

lcy2000 · ‎12-05-2024

Hello,

We are working on a PCIe Root Port with Arria 10 GX1150. And the Quartus version is 22.4. Currently the downstream device is a X520 (intel 82599) 10gigabyte ethernet adapter, capable of PCIe Gen 2 x 8.

The link has established at Gen1 x 8, smoothly. And after reading the Arria 10 Avalon-ST Interface User Guide, I understand that upgrading the PCIe link requires a explicit write to Link Control register in PCIe Capability.

Since we are using Configuration Space Bypass, perhaps the only chance to do so is to drive the LMI interface. And I set up a Singal Tap on ltssmstate[4:0] with a trigger of LTSSM Recovery States, as the retraining should be on.

But after I write bit 5 of Link Control register through LMI, nothing happens. Signal Tap is not triggered, and the retraining bit remains zero. And I have checked "Target Link Speed" field, it has a POR value of 3 (Gen 3) on root port side and 2 (Gen 2) on downstream EP.

My question is that is there any other pieces I'm missing to upgrade the PCIe link?

Thank you very much for handling this ticket.

Wincent_Altera · ‎12-05-2024

Hi,

The link has established at Gen1 x 8, smoothly.
>> is the pcie able to enumerate in the host ? LTSSM = L0 stage ?

But after I write bit 5 of Link Control register through LMI, nothing happens. Signal Tap is not triggered, and the retraining bit remains zero.
>> can I know any specific signal you mentioned ?
>> or you the whole stp is "waiting for trigger" May I know which input clock that you currently using ?
>> if in "waiting for trigger" perhaps you can try to change your input clock as "pld_clk" as per mentioned in the user guide
>> https://cdrdv2-public.intel.com/705404/ug_a10_pcie_avst-17-0-683647-705404.pdf

If you are targeted Arria 10 RootPort design, I suggest you to try out our example design
https://www.rocketboards.org/foswiki/Projects/Arria10PCIeRootPortWithMSI

Regards,
Wincent

lcy2000 · ‎12-06-2024

Hi Wincent,

Nice to meet you again. Last time I raised a ticket for Arria 10 DevKit FMC PERST problem. It seems like our solder mods works fine regarding PHY RESET. Thank you again

Sorry for my vague description. To clarify your questions:

Is the pcie able to enumerate in the host ? LTSSM = L0 stage ?

>> Yes, after initial link, LTSSM settled on L0 (0xF), which is good. We are able to do enumeration with no error, through AVST interface to downstream device.

Can I know any specific signal you mentioned ?

>> I have set up a trigger on LTSSM=0xc (Recovery.Rcvlock) , but it did not trigger. I also used signal tap transitional mode to capture the whole LTSSM transistion from reset to some time after writing Link Retraining bit. It shows that it remains on 0xF (L0) after initial link has established. And recovery states are never reached.

Which input clock that you currently using ?

>> I use coreclkout_hip for signal tap input clocking, which is the same clock for pld_clk as recommended in UG.

Further, I would like to ask:

1. Is LMI intended for retraining when CfgBp is enabled? I have read the golden example from rocketboards in our last threads. I believe it is not well suited, because of CfgBp.

2. I also read about Autonomous Speed Change logic (altpcie_sc_*.v) through Hard IP Dynamic Reconfig. Look like it never touched LMI somehow. Does it provides another option to do link upgrading?

Best regards,

Chenyang

Wincent_Altera · ‎12-06-2024

Hi Chenyang,

Okay, I recall the case you file, glad that you have the solution.
thanks for sharing with me how you able to resolve it

Sorry for my vague description. To clarify your questions:

Is the pcie able to enumerate in the host ? LTSSM = L0 stage ?

>> Yes, after initial link, LTSSM settled on L0 (0xF), which is good. We are able to do enumeration with no error, through AVST interface to downstream device.
--> okay sound good, Are you using our example design ? if Yes , May I know which design that you are using ? AVST ?
--> or this is custom design ?

Can I know any specific signal you mentioned ?

>> I have set up a trigger on LTSSM=0xc (Recovery.Rcvlock) , but it did not trigger. I also used signal tap transitional mode to capture the whole LTSSM transistion from reset to some time after writing Link Retraining bit. It shows that it remains on 0xF (L0) after initial link has established. And recovery states are never reached.

--> How was other signal ? in this cases I suspect other signal might interrupt the signal captured.
--> using transitional is just fine, perhaps you can set other signal rather than ltssm as "dont care".
--> or set the ltssm signal as "dont care" so that you can monitor overall of the link up flow including "0xc"

Which input clock that you currently using ?

>> I use coreclkout_hip for signal tap input clocking, which is the same clock for pld_clk as recommended in UG.

--> okay that sound fine, as long as you can capture the signal instead of "waiting for trigger"

Further, I would like to ask:

1. Is LMI intended for retraining when CfgBp is enabled? I have read the golden example from rocketboards in our last threads. I believe it is not well suited, because of CfgBp.
--> to be honest, I never implement LMI with CfgBP before. But I can lay down some of my suggestion based on my own understanding. Hope that can be a good reference for you
--> The Local Management Interface (LMI) is used to access and control various configuration and status registers within the PCIe Hard IP. It allows for reading and writing to these registers, enabling fine-grained control over the PCIe link and its parameters.
--> The Configuration Bypass (CfgBp) feature allows the user to bypass the automatic configuration of certain PCIe parameters and instead manually configure them through the LMI. This can be useful for custom configurations or for debugging purposes.

--> When CfgBp is enabled, the automatic configuration of certain PCIe parameters is bypassed, and the user must manually configure these parameters using the LMI. This includes tasks such as setting the link width, speed, and other configuration settings.

--> as following above , LMI is intended use in retaining the PCIe link when CfgBP is enable, with that you can manually configure the PCIe link parameter and initiate the retraining process.

2. I also read about Autonomous Speed Change logic (altpcie_sc_*.v) through Hard IP Dynamic Reconfig. Look like it never touched LMI somehow. Does it provides another option to do link upgrading?
--> I dont get your question, what upgrading that you are referring ? speed upgrade or width ?

Regards,

Wincent

lcy2000 · ‎12-06-2024

Hi Wincent,

After re-reading the UG of earlier generation devices hours ago, I found that asserting cfgbp_req_phycfg[3] should bring the Hard IP in ltssm state. And it works.

The LTSSM transitions after reaching L0 are: L0 -> Recovery.Rcvlock -> Recovery.Rcvconfig -> Recovery.Idle -> L0. But sadly the currentspeed has not changed so far. It remains on Gen1.

Now that I have made LTSSM to Recovery. I suspect the training has failed due to SI issue. Do you have any suggestions on debugging recovery train failures? How can I confirm the training has failed? Is there any signals I could look for? For example signals related to Bit Error Rate.

For previous discussion, let me to further clarify my design:

--> May I know which design that you are using ? AVST ? or this is custom design ?

>> Yes we have a custom design working on raw TLP packets. For now, we have an AVST PCIe IP, driven by a soft-core and DMA engine.

--> How was other signal ?

>> I use transitional mode with no trigger. Traces are dumped when stop manually. My understanding for now is that: LMI is only for AER logging and other functions are not guaranteed, as documented in the UG: "LMI interface is used to write log error descriptor information in the TLP header log registers. The LMI access to other registers is intended for debugging, not normal operation."

>> It's a pity that "cfgbp_*" prefixed signals are overlooked there, because they are undocumented in current UG. There's a only a few words on earlier generations (primarily on V series).
--> I dont get your question, what upgrading that you are referring ? speed upgrade or width ?

>> I mean speed upgrade.

lcy2000 · ‎12-07-2024

Hi Wincent,

After a rough reading on PCIe LTSSM transition rules today. I assume a desired speed change (upgrading) on LTSSM should be:
L0 -> Recovery.Rcvlock -> Recovery.Rcvconfig -> Recovery.Speed -> Recovery.Rcvlock (at higher PHY rates) -> Recovery.Rcvlock -> Recovery.Rcvconfig -> Recovery.Idle -> L0

Please kindly confirm if this is correct.

So the main problem here is that LTSSM enters Recovery.Idle rather than Recovery.Speed, after Recovery.Rcvconfig. And according to my research, PCIe spec has defined possible reasons. Clearly, it should be a negotiation failure on TS2, rather than SI issues, because we are still on Gen1 data rates.

Page 291 of PCI Express Base Spec Rev 3.0

I'm also using SoftDFE and SoftPolarityInv fixups, as they are documented to address compatibility problem in a open system. In Soft Polarity Inversion fixup, it mentioned that TS2 Ordered Set may not be received correctly during the Polling.Config state. Another question here, does that affects the speed negotiation?

Furtherly, because Soft Polarity Inversion is enabled and the IP is plain text. And it demonstrates the usage of test_out interface. I plan to extract TS2 raw data from test_out PIPE interface. Let's see if it work out.

Wincent_Altera · ‎12-08-2024

After a rough reading on PCIe LTSSM transition rules today. I assume a desired speed change (upgrading) on LTSSM should be:
L0 -> Recovery.Rcvlock -> Recovery.Rcvconfig -> Recovery.Speed -> Recovery.Rcvlock (at higher PHY rates) -> Recovery.Rcvlock -> Recovery.Rcvconfig -> Recovery.Idle -> L0

Please kindly confirm if this is correct.
>> About PCIe link training state machine you can refer below

>> https://www.oreilly.com/library/view/pci-express-system/0321156307/0321156307_ch14lev1sec6.html

So the main problem here is that LTSSM enters Recovery.Idle rather than Recovery.Speed, after Recovery.Rcvconfig. And according to my research, PCIe spec has defined possible reasons. Clearly, it should be a negotiation failure on TS2, rather than SI issues, because we are still on Gen1 data rates. Page 291 of PCI Express Base Spec Rev 3.0
I'm also using SoftDFE and SoftPolarityInv fixups, as they are documented to address compatibility problem in a open system. In Soft Polarity Inversion fixup, it mentioned that TS2 Ordered Set may not be received correctly during the Polling.Config state. Another question here, does that affects the speed negotiation? Furtherly, because Soft Polarity Inversion is enabled and the IP is plain text. And it demonstrates the usage of test_out interface. I plan to extract TS2 raw data from test_out PIPE interface. Let's see if it work out.

>> Okay, I see you are referring to the correct PCie base spec, let me know if there is anything I could help you to move forward.
>> But for debugging ltssm stucking in certain stage , I do recommend you to check below guide, it might give you some of the hint to move forward.
>> https://community.intel.com/t5/FPGA-Wiki/FTA-PCI-express/ta-p/735993

Regards,
Wincent

Wincent_Altera · ‎12-10-2024

Hi,

I wish to follow up with you about this case.

Do you have any further questions on this matter ?

Else I would like to have your permission to close this forum ticket. Nevertheless, you can still response to the forum and I will be available to assist you.

Regards,

Wincent_Altera

p/s: If any answer from the community or Altera Support is helpful, please feel free to give the best answer or rate 9/10 survey.

lcy2000 · ‎12-12-2024

Hi Wincent,

Sorry for delay this week. We are still checking the SI issues on our side. It is ok to close this case, which I believe is unrelated.

Thank you very much for your patience

Best regards,

Chenyang

Wincent_Altera · ‎12-12-2024

Hi ChenYang,

Thanks for your confirmation, If you seeing any issue related to IP / FPGA. feel free to file a new thread.
We will be there to help out.

Hence, I will transitioned this thread to community support.

If you have a new question, feel free to open a new thread to get support from Altera experts.

Otherwise, the community users will continue to help you on this thread. Thank you

If your support experience falls below a 9 out of 10, I kindly request the opportunity to rectify it before concluding our interaction. If the issue cannot be resolved, please inform me via this forum page of the cause so that I can learn from it and strive to enhance the quality of future service experiences.

Regards,

Wincent_Altera

p/s: If any answer from the community or Altera Support is helpful, please feel free to give the best answer or rate 9/10 survey.

Arria 10 PCIe Retraining with LMI with Configuration Space Bypass Enabled

Interface Protocol - PCie (Avalon-MM|Avalon-ST)