PCI express gen3 x8 Avalon-MM 256 bit DMA core multiple request hang issue

Altera_Forum · ‎05-03-2014

In attempting to perform multiple DMA requests in parallel with the example design, it seems that something in the core ends up in an inconsistent state. After performing a single DMA read of 128 bytes, the EPLAST value is written correctly to the beginning of the descriptor table stored in root port memory. The DMA operation can be repeated afterwards just by setting the start bit in the control register. However, if I request a read of 128 bytes and a read of 1024 bytes in parallel with two descriptors in the table, this EPLAST write never takes place and the core seems to be in an inconsistent state - setting the start bit is ignored and no further DMA requests can be initiated without reconfiguring the FPGA. Does anyone know if this issue lies in the DMA core itself or in the descriptor controller?

If it's in the descriptor controller, it's not really a big problem because I'm eventually going to write my own. However, I do need the DMA core working, and I have already found one bug in it so far.

Altera_Forum · ‎05-06-2014

I can't see how a DMA controller that accesses the PCIe block through it's Alavon slave interface can ever generate multiple concurrent TLB - especially read ones for different transfers.

To my mind the DMA controller needs to be embedded inside the PCIe block.

Altera_Forum · ‎05-07-2014

Well, the PCIe module itself just deals with Transaction Layer Packets, forwarding these over the bus via Avalon ST interfaces. The gen 3 DMA module manages creating the read and write request TLPs as well as dealing with the response packets. Initiating two simultaneous DMA operations just means that the DMA module will send both request TLP packets back-to-back with different tags without waiting for the first operation to complete. Any data associated with these transfers is passed through a pair of 256 bit Avalon MM interfaces. Then the host computer will fulfil the requests and return the responses, and the DMA module will do the bookkeeping so that can write the incoming data to the right addresses and it knows when the operations are completed. The DMA module also has a couple of ports where non-DMA transactions are handled; these come out as 32 bit Avalon MM interfaces.

Initiating a DMA operation requires passing a DMA descriptor into the DMA module with a dedicated Avalon ST interface. The DMA module can manage some number of descriptors internally; I'm not sure how many offhand. When the operation is completed, the status is passed back out another Avalon ST interface.

The example design also has an additional component: a descriptor controller. This component manages copying the descriptors from the host computer, transferring them to the DMA module, waiting for the response from the DMA module, and then notifying the host that the operation is complete. The descriptor controller can supposedly manage multiple DMA descriptors in one descriptor table, but it seems that there is a bug in how this is handled.

This is what I am seeing when I attempt to perform two DMA reads at the same time on the example design:

1. I set up a DMA accessible buffer on the host and put some data in it

2. I create a descriptor table in a different place in the same buffer, and set up pointers to the data in the buffer I want to copy and to where I want to copy it on the FPGA

3. I pass a pointer to the descriptor table to the card

4. I pass a pointer to where I want to copy the descriptor table on the FPGA

5. I initate the DMA read operations by setting a bit in the control register

6. After waiting for the request to complete, I can access endpoint memory with non-DMA requests and read out the correct data in the correct locations, indicating that the DMA request completed properly. However, the descriptor controller was supposed to write EPLAST to the start of the descriptor table in host memory, and this did not happen. Also, if I try to initiate another DMA request, nothing happens.

If I look at the output of my N5306A protocol analyzer (I am actually running the card at Gen 1 x4 for debugging, the analyzer is too old to read Gen 3 data) this is what I see:

1. Write request TLPs from host to write control registers in descriptor controller

2. Read request TLP from DMA module to read descriptor table entries

3. Read completion TLPs from host containing all read descriptors

4. Read request TLPs from DMA module, as requested by the descriptors

5. Read completion TLPs from host containing the requested data

6. If there was only one entry in the descriptor table: a write request packet from the card to the descriptor table in host memory indicating that the operation is complete

So the DMA read is performed correctly even with two descriptors in the table; the correct TLPs are generated by the card after the table is copied, then the host fulfils the requests, and the data ends up in the correct memory locations on the FPGA. However, something fails after the DMA operation is completed: the host is never notified that the operation is complete, and all subsequent control register writes to start another DMA operation are ignored. There is obviously a bug in either the DMA module failing to notify the descriptor controller correctly, or with the descriptor controller not responding correctly to the signals from the DMA module. Either way, Altera needs to fix their example design.