While the Avalon ST interface is a little bit easier to handle than the Descriptor/Data Interface, I do miss ko_cpl_spc_vc0, required for proper Rx buffer overflow prevention. The SGDMA example solves the problem by hard-wiring it inside of pipen1b, probably based on the PCI Express Compiler’s settings:
ko_cpl_spc_vc0(7 DOWNTO 0) <= std_logic_vector'("00011100"); ko_cpl_spc_vc0(19 DOWNTO 8) <= std_logic_vector'("000001110000");I can parse the generated .XML file to search for the final active settings for the completion credit and generate a VHDL package from it:
<PRIVATE name = "p_pcie_completion_data_credit_vc0" value="112" type="INTEGER" enable="1" /> <PRIVATE name = "p_pcie_completion_header_credit_vc0" value="28" type="INTEGER" enable="1" />Is there no way around generating code or hardwiring that? Is there no other, more “dynamic” way for handling that in the IP like using the Configuration Space Signals tl_*? Am I missing something obvious?
Hi MatthiasDid you ever find a solution to this problem? I have a similar situation where I am using the Avalon ST interface to the PCIe hard macro in an Arria II GX, and am running into rx buffer overflow. As far as I can tell, none of the test_out[63:0] signals let me see either the rx buffer overflow status, or rx buffer space available. The only solution I have found is to limit the number of MRd requests issued so that there is space for them in the rx buffer, based on the PCIe Megawizard settings, and knowing the max payload size (256 or 128bytes) and host system read completion boundary size (64 bytes in my case. Thanks Brendan
Hi Brendan,Yes, I found a solution (which I don’t use yet): Just use the undocumented signal of that name that is actually present on the hard IP interface, despite of the docs that show it just on the descriptor/data IP macro. Regarding your receive buffer management issues: After spending quite some time implementing something that looks to be compliant to what’s written in the PCIe Spec and the Altera PCIe Compiler Guide, I took a look at UG517 from Xilinx. Appendix E from my edition (v5.0), »Managing Receive-Buffer Space for Inbound Completions«, not only describes just a simple algorithm but gives a complete overview of this topic and lists multiple options of different complexity/size/efficiency, together with more or less complete descriptions of algorithms for immediate implementation. The only topic missing in this chapter is proper Completion Timeout handling in the given context. BTW: I found out the code I had written was an implementation of DATA_FC while my interfaces were designed like STREAM_FC. The chapter reading showed me that there are other, simpler ways of implementing the required functionality which I hadn’t thought of. Hint: The less receive completion buffers you allocate (»Desired performance for received requests« towards Maximum, »Desired performence for received completions« towards Low) the more effect will the granularity (efficiency) of your DMA read completion buffer allocation have for your throughput. edit: well, beside the hard ip macro where it’s an undocumented feature, the signal is actually not present on the soft ip macro but on the descriptor/data interface, and that’s where it’s documented.
Hi MatthiasThank you for the very detailed answer. I can see the k0_cpl_spc_vc0 bus in simulation, so that will help with one part of the problem. Also, the Xilinx UG517 is very useful, and provides more detail than the Altera PCIe User Guide. However, I think the algorithms in UG517 assume that the user application has access to the values for CPLH and CPLD credits on the receive side. When I use the test_in bus on the PCIe hard macro to drive the receive credits on the test_out bus (ie test_in[11:8] == 6), the values I see on test_out are 64'h0361_6832 and 64'h0351_6832. According to PCIe User Guide Version 9.1 Table B-9 the CPLH and CPLD credits are at bits 43:36 and 55:44 respectively, and these are always 0, indicating infinite credits, which is in agreement with the PCIe spec. When I do get receive buffer overflow, the PCIe hard macro just swallows the RCB, with no indication on any signals that I can see. Even the specific receiver overflow bits on the test_out bus (selectable with other values of test_in[11:8]) don't seem to toggle. I'm currently implementing the LIMIT_FC algorithm, and unless I'm wrong about the CPLH/CPLD credits, I think this is my only option, though I'd be glad to find out that I'm wrong. Brendan
Hi Brendan, I think that there is a misunderstanding about credit accounting on your side.All credit accounting is done inside the PCIe IP blocks, except for CPLH/CPLD credit, so accounting for that must be done in the application. An application that does not handle CPLH/CPLD credit appropriately will eventually run over the receive buffers in the PCIe IP, lose data and will fail, depending on the recovery options implemented in the application and the PCIe IP. Think of the credit handling as a required add-on for tag handling: You are not allowed to use a tag if a completion with that Transaction ID is still flying around; similarly, you are not allowed to send out a read request if the completion(s) wouldn’t fit the receive buffers at the time of arrival. IMO there are two reasons for having CPLH/CPLD credit accounting in the application: 1. TLP Transaction Ordering rules, 2. Completion Timeout handling. For the former: If the PCIe IP would just block your read request because of a lack of CPLH/CPLD credit, it would block any other, higher priority outbound traffic as well, like completions to CPU read requests (see cases D3/E3 in the Ordering Rules Summary Table). That would allow deadlocks to occur. For the Completion Timeout aspect: This is an application-specific mechanism that can not be supported universally by the PCIe IP, and that problem is FPGA vendor independent. Sad enough, Altera’s example code (chaining_dma) does not contain any logic for Completion Timeout handling. You are on the wrong track if you try to get any support for CPLH/CPLD credit accounting from the IP block’s test_in/_out. You have to do it on your own, completely. The only hint you get from the IP block is ko_cpl_spc_vc0 – and in its absence, the equivalent information from the .XML/.HTML or comments in the top-level megawizard file –, which (statically) indicates the maximum credit you have for replies to your read requests. Based on that credit information, your CPLH/CPLD credit handling logic has to: • check available CPLH/CPLD credit before sending the read request, eventually delaying the read request until sufficient credit is available – mind PCIe ordering rules and let other outbound traffic pass –, • debit CPLH/CPLD upon actually sending out a read request, • charge CPLH/CPLD when a read completion arrives (but charge only to the extent of the actual reply size), • charge CPLH/CPLD with all remaining credit of that tag when a Completion Timeout occurs (and notify the read requester of the termination of that read request). In my application I have a block called tagcc that is responsible for tag management and CPLH/CPLD handling. Prior to forming a read request, any read requester block (descriptor fetch, DMA TX) has to ask for a tag from tagcc, indicating the amount of completion credit this read request will require. tagcc will only release a tag if there is a tag available (not all are in-flight) and there is enough credit for the completions not to overrun the buffer on arrival. tagcc remembers three things: (a) this tag is in-flight, (b) the time of issue, (c) the amount of outstanding CPLH/CPLD. On arrival of completions, not only the global CPLH/CPLD counters are charged, but the tag’s outstanding credit is reduced. Once the final completion arrives, the tag is free to be used again and marked not-in-flight. In the background, all in-flight tags are checked for a timeout, and once a Completion Timeout occurs, the tag’s outstanding CPLH/CPLD credit is freed, the tag itself is freed, the read requester is informed of the termination, and an appropriate error is issued on cpl_err. Additionally, my receive TLP interface is designed to support STREAM_FC. If STREAM_FC (in Xilinx notation) was officially supported by the Altera PCIe Compiler User Guide, I could drop CPLH/CPLD accounting altogether and would be done with just in-flight and Timeout handling. – Matthias
Thanks again, Matthias, for another very detailed explanation.I had already implemented the LIMIT_FC algorithm, using a fairly crude mechanism to allocate and free the read request CPLH/CPLD buffers, and this has been working fine in actual hardware. As you mention in your reply, this really has to be associated with the tag allocation and freeing. I had hoped that the PCIe hard macro would provide me with the realtime status of the CPLH/CPLD buffers, so that I could have a cleaner implementation in hardware, and one with finer granularity. I had also hoped that I could find out from the PCIe hard macro when my algorithm failed, and CPLH/CPLD buffer overrun occured. This would be very useful during the debug process, but would also improve overall system reliability in that I could use this to throw an interrupt and alert the driver that something had gone wrong. I can achieve all the functionality I need by doing all of the monitoring/allocating/freeing in my logic. I can use the timeout mechanism to find out when buffer overrun occurs. Its all doable, just more work on my end. Thanks, Brendan
Brendan,--- Quote Start --- I had hoped that the PCIe hard macro would provide me with the realtime status of the CPLH/CPLD buffers, so that I could have a cleaner implementation in hardware, and one with finer granularity. I had also hoped that I could find out from the PCIe hard macro when my algorithm failed, and CPLH/CPLD buffer overrun occured. This would be very useful during the debug process, but would also improve overall system reliability in that I could use this to throw an interrupt and alert the driver that something had gone wrong. --- Quote End --- I had hoped that, too. At first I was glad to see that, according to the PCIe compile manual, the IP would filter out (read) completions that don’t belong to read requests. I understand that the IP can track my read requests as they hit the PCIe network and can account for received completions. But in no way it can account for the operation of my Completion Timeout mechanism implemented solely inside the application. That means: In case of Completion Timeout I have to drop completions of which the IP block doesn’t know they have already timed out, until the tag is freshly used for the next read request. This means:All the tag handling logic currently in the IP for catching invalid completions is worthless and has to be duplicated in an application-specific way inside my logic. --- Quote Start --- I can achieve all the functionality I need by doing all of the monitoring/allocating/freeing in my logic. I can use the timeout mechanism to find out when buffer overrun occurs. Its all doable, just more work on my end. --- Quote End --- Hmmm … I don’t quite understand what the timeout mechanism has to do with detecting buffer overruns. The timeout mechanism just ensures that the user logic doesn’t have to block and wait forever for a lost read request or completion but can continue, probably dropping data or requiring a soft reset of hardware, driver and/or user software. Just look at the description of cpl_err and the chapter about Error Handling to see what to do in which case. You cannot detect buffer overruns, and you should do your part of preventing overruns by properly handling CPLH/CPLD before sending a read request – the PCIe IP already does so for all other credit-based transactions. And, yes, it’s all work on your end. ;)