FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6595 Discussions

Arria 10 PCIe Gen 3 stuck at Equalization.Phase3

lcy2000
Beginner
433 Views

Hi,

 

I'm writing a design with Arria 10 AVST PCIe IP (Quartus 24.3).  Currently we have encountered equalization problem at PCIe Gen3. We are using A10GX1150 DevKit as an add-in card attached to a Z790 based Motherbord (ASPM disabled in BIOS). The system boots with Gen3 successfully, I can access configuration space and ltssmstate reached L0. But after loading the driver, I saw ltssmstate of the IP drops to Equalization.Phase3 and never come back. The IP is configured with SoftDFE and SoftPolarityInversion enabled.

 

My question is that what should I do next to debug or workaround this problem? Is it allowed for LTSSM to stuck at Equalization.Phase3?

 

0 Kudos
1 Solution
Wincent_Altera
Employee
233 Views

Hi ChenYang,

In my understanding of PCIe, contrary to MalfTLP, BadTLP is a much more underlayered concept with a only a few reasons to trigger: (1) Link CRC error (2) Sequence Number error. (Refer to PCIe 3.0 Spec, Figure 3-17, Page 185/860)
>> Okay

>> Another that that you could check will be

  • Do you check if your endpoint has received the 8 required consecutive TS2's ?
  • But before it is able to complete sending 16 TS2s, the downstream port sends EIEOS and then starts sending TS1 ?
  • Based on my experience, once it meet sending 16 TS2, it shall transition to Recovery.idle  since the requirement for that transition are meet >>Recovery.equilization phase X >> L0. 

    Regards,
    Wincent_Altera

View solution in original post

0 Kudos
8 Replies
Wincent_Altera
Employee
347 Views

Hi,


May I know which driver that you are using ? is it driver provided in the Example design itself ?

Please check your BIOS setting, ensure that it is set as "gen3" according to your design instead of "AUTO".

See if this able to solve your problem or not


Else.

But after loading the driver, I saw ltssmstate of the IP drops to Equalization.Phase3 and never come back. 

>> Recovery.Equalization with Phases 0–3, reflects progress through Gen3 equalization. Phases 2 and 3 of link equalization are optional. Each link must progress through all four phases, even if no adjustments occur. If you skip Phases 2 and 3, you speed up link training at the expense of link BER optimization

>> if the BER optimization is not something you needed you may ignore it

>> For detailed information about the four-stage link equalization procedure for 8.0 GT/s data rate, refer to Section 4.2.3 in the PCI Express Base Specification, Rev 3.0.

>> Seen Phase 3 is optional process, and in your design it not pass through, you may consider to skip it to enable your project feature.


Hope this clarified,


Regards,

Wincent_Altera


0 Kudos
Wincent_Altera
Employee
321 Views

Hi,

I wish to follow up with you about this Forum case. Do you have any further questions on this matter ?

Else do I have your permission to close this forum ?


Regards,

Wei Chuan


0 Kudos
Wincent_Altera
Employee
270 Views

Hi,

As we do not receive any response from you on previous question/reply/answer that we provided. Please login to “https://supporttickets.intel.com/s/?language=en_US’, view details of desire request, and post a feed/response within net 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you on follow-up questions.

Regards,

Wincent_Altera


0 Kudos
lcy2000
Beginner
257 Views

Hi Wincent,

Nice to meet you again. Sorry for delay these days.

I realize that I misread the LTSSM signal. It stucks at 0x0D, which is Recovery.Rcvconfig. What we are trying to do is to implement an switch on Arria 10, with a NVMe device as a downstream EP. So we are loading the generic linux nvme driver. The Gen 3 link has established for some time. But soon it drops to Recovery.Rcvconfig during initialization of the driver. And for most of the time, I saw "BadTLP+" on the AER capability of the root port the FPGA is connected to, and sometimes the host even hangs with Machine Check. We tried to disable ECRC generation/forwarding and it didn't help.

In my understanding of PCIe, contrary to MalfTLP, BadTLP is a much more underlayered concept with a only a few reasons to trigger: (1) Link CRC error (2) Sequence Number error. (Refer to PCIe 3.0 Spec, Figure 3-17, Page 185/860)

My question is that, how can I move forward to debug this error?

One thing I just realize I may not handled well is the middle of TLP requirements on tx_st_valid signal, which generally requires a TLP to be transfered on TX AVST as a whole. I will look through it and come back with my findings.

0 Kudos
Wincent_Altera
Employee
234 Views

Hi ChenYang,

In my understanding of PCIe, contrary to MalfTLP, BadTLP is a much more underlayered concept with a only a few reasons to trigger: (1) Link CRC error (2) Sequence Number error. (Refer to PCIe 3.0 Spec, Figure 3-17, Page 185/860)
>> Okay

>> Another that that you could check will be

  • Do you check if your endpoint has received the 8 required consecutive TS2's ?
  • But before it is able to complete sending 16 TS2s, the downstream port sends EIEOS and then starts sending TS1 ?
  • Based on my experience, once it meet sending 16 TS2, it shall transition to Recovery.idle  since the requirement for that transition are meet >>Recovery.equilization phase X >> L0. 

    Regards,
    Wincent_Altera
0 Kudos
Wincent_Altera
Employee
195 Views

Hi,


I wish to follow up with you about this forum thread.

Do you have any further questions on this matter ?


Regards,

Wincent_Altera


0 Kudos
lcy2000
Beginner
159 Views

Hi Wincent,

Thank you for your advice for moving forward. They are be of great help.

After a simple field test, we has initially confirmed that tx_st_valid cannot be deasserted right in the middle of a TLP and they can lead to BadTLP errors as the TLP is not transferred to the IP correctly. So I believe this case can be closed now.

Sorry for delayed response due to a national holiday from 5.1 to 5.5.

 

Chenyang

0 Kudos
Wincent_Altera
Employee
140 Views

Hi Chengyang,


Glad that this issue is resolved, thanks for sharing with me how you resolve this.

If you have issue coming forward feel free to file a new thread, We will be full commitment to support.


Regards,

Wincent_Altera


0 Kudos
Reply