PCIe core gets confused with stalls. Help?

Altera_Forum · ‎11-19-2011

I have a Stratix IV GX design up and running using the PCI Express hard core. In runs in simulation and in HW. I'm using PCIe gen 1 x4. My simulation environment borrows heavily from the chained DMA example, but the application code in my design is all original.

In an effort to shake out any lingering bugs in my PCIe application layer logic, I created a small logic block to randomly suppress tx_st_ready, e.g. the ready signal flowing from the IP core to my logic, which effectively stalls my logic. This was successful in finding a few latent bugs in my code, but there is one last problem that has me stumped. Both in simulation and in our HW platform, if I enable the random stalls and run a bunch of transactions, my app logic sees erroneous PCIe traffic and gets confused. The same test passes fine if I disable the random stalls. I traced both interfaces (rx_st_* and tx_st_*) to the PCIe core in simulation, and compared traffic on each interface between a passing run (no stalls) and a failing run (random stalls). Ignoring time differences between the two runs, I find that what I'm sending to the core on the tx_st_* interface matches exactly between passing and failing cases, but for some reason in the failing run I see a corrupted read completion on the rx_st_* interface.

So it really seems as if the PCIe IP is somehow getting confused when my logic sends packets with gaps (cycles where valid=0 between the first and last cycle of a packet), even though this is technically within spec for the ST interface, as far as I know.

Has anyone seen anything like this? Are there some rules for the ST interfaces that I'm violating, e.g. is it asking for trouble to let packets dribble in to the core rather than sending them on contiguous cycles?

The closest thing I've been able to find in the Altera knowledge base is article rd11232010_135 (I'd post a link, but my post count isn't high enough for that to be allowed) which explains the importance of application code watching out for rx buffer overflow.

This article is extremely vague about exactly how the application layer should do this ("users need to create their own logic in their application layer to monitor the rx buffer capacity to avoid the rx buffer overflow."), but since my logic never requests more than 2KB of reads at a time or 16 tags, and I believe the Stratix IV hard IP has a fixed 16KB RX buffer and at supports at least 32 tags, I think I should be fine.

Any thoughts on what could be causing my problem, or how I can debug it? I tried enabling DEBUG messages in the root port BFM, but that wasn't enlightening.

-Brett

Altera_Forum · ‎11-21-2011

No, it’s not within specs of the PCIe hard IP core AST interface to send with gaps if the hard IP doesn’t ask for a gap. Look at the signal description of tx_st_valid int the PCI Express Compiler User Guide (quote is from v10.0):

--- Quote Start ---

tx_st_valid<n>:

Clocks tx_st_data<n> into the core. Between tx_st_sop<n> and tx_st_eop<n>, must be asserted if tx_st_ready<n> is asserted.[…]

--- Quote End ---

So, as long as the hard IP doesn’t do flow control by deasserting tx_st_ready<n>, you absolutely have to provide data until the end of the TLP packet.

Altera_Forum · ‎11-21-2011

--- Quote Start ---

No, it’s not within specs of the PCIe hard IP core AST interface to send with gaps if the hard IP doesn’t ask for a gap. Look at the signal description of tx_st_valid int the PCI Express Compiler User Guide (quote is from v10.0):

So, as long as the hard IP doesn’t do flow control by deasserting tx_st_ready<n>, you absolutely have to provide data until the end of the TLP packet.

--- Quote End ---

Wow, thanks, I totally missed that in the specs. Guess now I know why they make that a requirement. :) Wish their core had some sort of warning for violations of this rule.

Luckily, my real logic has no trouble with the no-stalls requirement, once I remove my fake-stall-inserter. This does make it considerably harder to verify that customer application logic will properly handle IP-generated stalls on this interface, but at this point I guess I've tested my code pretty well.

Thanks for the clarification.

-Brett