PCIe simple transaction

Altera_Forum · ‎09-28-2011

Hi

How to perform simple memory read and write in PCIe(megafunction - avalon ST)? Which signal to be set in order to have write and read function? OR do i need to construct the header before send via tx_st_data0?

The existing example is quite complicated for me. I have spent few days just to look for the way of payload sent via tx_st_data0 but in vain.

Does anyone has a simple design(without dma) and testbench that can do memory read and write in PCIe?

Thanks.

Altera_Forum · ‎09-28-2011

Whenever your IP creates read and write transactions as a client, this *is* actually DMA. This means:

• You have to know an address to use for your transfers, which is typically based on a value set by your OS driver,

• You have to watch out for signals from the configuration space that indicate whether DMA is allowed at all and about maximum request/response lengths of transfers,

• You have to find an algorithm for choosing a valid tag for your read requests,

• You have to find a way of dealing with the maximum read buffer credit (separate for data and header) within your FPGA PCIe IP so it doesn’t get overrun by completions of many or large read requests,

• You have to deal with read completion reordering,

• You have to take care of outstanding read requests and catch the returned value, still looking at the clock and discarding/retrying a read request when it runs into a timeout,

• Properly handle the timing constraints on Avalon ST TX port where you have to apply your transfer (header+data) at full rate without sender-imposed wait states,

• Properly handle the timing constraints on AST RX and TX where you have to take care of the PCIe TLP ordering rules, i.e. let completions pass new requests, and handle/buffer the infamous 14 (or 26) non-posted requests that can still enter your design even if you just applied rx_st_mask<n>,

• You have to find a way of responding to *any* non-posted request, even if you don’t intend to support a specific address range or access type,

• You have to deal with the asynchronous relation between finishing your last (write) request and setting an interrupt for handing over control to your driver.

To answer your question: Yes, on AST you create transaction-level packets on the TX port that start with a header of 3 or 4 DWORDS, followed by 0 or 1 unused alignment DWORD, followed by 0 to n data DWORDS. Similarly, you receive TLPs on the RX port that contain a similar header, alignment and data DWORDS.

You have to be familiar with that AST TLP frame layout even if you don’t do DMA, i.e. if you just want to be able to respond to PIO requests from your host OS driver, you have to deal with any incoming request, analyze the format, address, flags and data. And, for any non-posted requests, you have to form a matching completion packet.

To give you an overview of the complexity of a design, here is a list of blocks in my AST PCIe client with two-way DMA but still without MSI-X:

– rdbuf: Captures any received PIO read request from AST RX in a buffer and drives rx_st_mask when the buffer starts to fill up,

– pcie_reg: handles any PIO read/write requests (write requests are handled at full speed directly from AST RX, read requests are handled when they drop out from ast_rdbuf and if AST TX is able to swallow the replies). The driver uses this interface to initialize, configure and control the device, so this block handles DMA descriptor addresses, interrupt status and interrupt generation as well as ,

– tagcc: handles DMA non-posted (== read) tags: keeps a queue of flying tags and maintains their completions and timeout directly watching out for completions on AST RX,

– desc_fetch: reads jobs from a descriptor queue in main memory by fetching a read tag from tagcc, reading a descriptor entry and providing it to one DMA block (there are two instances of desc_fetch in the design, one for rx, one for tx),

– dma_rx: does actual DMA writes to main memory based on descriptors fetched by desc_fetch,

– dma_tx: does actual DMA pipeline reads, read completion reordering and storage in local memory, uses tags from tagcc,

– ast_txmux: priority multiplexer that selects one of the dma engines or the descriptor fetch engines for their DMA requests, or the pcie_reg for its completions.

In addition, there is a pcie_conf block that demultiplexes the LMI PCI configuration space data.

To give you an idea of the complexity: Apart from wrapper modules, the total code size for these logic blocks is less than 200 kB in VHDL, and my Linux driver is less than 75 kB of C code.

Altera_Forum · ‎09-29-2011

Thanks for the explanation. There are a lot of stuff i dono abt PCIe. That's why i start with some example provided by altera in order to understand the behaviour of PCIe. However, the example provided by altera is too complicated to understand and userguide of PCIe is not for beginner like me.

Do you any simple project or tutorial that can let me understand PCIe easily?

Altera_Forum · ‎09-29-2011

I don’t have code to share with you, sorry. If you intend to do anything with PCIe, I highly recommend reading the PCI specification first – don’t skip that! – next the PCIe specification. You will not understand PCIe at all by just looking at Altera’s User Guide and the example source code. Note that Altera’s example source code lacks support for some important PCIe aspects like transaction timeouts (at least this was true in Quartus II releases up to 10.1).

Altera_Forum · ‎09-30-2011

Thanks for your suggestion. Now, i know where should i start from... really appreciate it.