FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6355 Discussions

PCIe Hard IP AST Endpoint: Incoming Non-Posted Request Flow Control

Altera_Forum
Honored Contributor II
945 Views

I am not sure I understood the flow control for answering non-posted requests (i.e. sending read completions) correctly. As the rate of requests can be higher than the rate of responses, at some point I have to stop accepting non-posted requests, finishing pending read completions first. Consider a large read request and the TX fifo is full (tx_ready = 0): I cannot accept another read request before I have finished processing the last one, and I don’t know upfront about the TX path timing. 

 

There are two ways of back-pressuring read requests. First, the sledge-hammer way is to de-assert rx_ready. This means that I don’t accept any more packets – not even posted write requests or completions – until I am ready to process the next read request. I am not sure whether this is a valid behavior as it violates the ordering rules (write and completion packets must be able to pass read requests). 

 

My second option is the assertion of rx_mask. At first glance, it looks as the right solution as it just back-pressures read requests, not write or completion packets. But it has two issues: One, according to the User Guide (9.1 SP1) the PCIe Hard IP block can forward up to 14 (26 for 128-bit AST) more read requests until rx_mask is in action – it would be much better if the Hard IP would use this signal similar to rx_ready with a two cycle activation delay, doing the buffering and reordering for me. Two, the user guide does not specify how long it takes the hard IP to resume read request forwarding upon de-assertion of rx_mask, and what measures one must take to maintain full throughput performance. One must remember that while most bandwidth should be used by DMA transfers, there are some registers the device driver has to read, and performance for these transfers is critical to the overall system performance. 

 

While I’m tempted to go the first way as it is the easiest approach implementation-wise, the better approach seems to be the implementation of a read request fifo with, say, 20 entries depth. As soon as 5 read requests are pending in this fifo, rx_mask is asserted to prevent a fifo overflow. As soon as enough fifo entries are empty again, rx_mask is de-asserted. The fifo should buffer a couple of requests, but I can just guess a number for it because of the lack of documentation about the case of de-asserted rx_mask. 

 

Any clue, anything that I have missed? 

 

Edit: Corrected posted vs. non-posted
0 Kudos
0 Replies
Reply