PCIe packet in cyclone VI GX

Altera_Forum · ‎04-11-2012

hello,

I work on the PCIe protocol in Cyclone IV GX.

I configured the PCI Express IP with Megawizard plug-in manager as an endpoint, and I want to send a packet from an endpoint to an other through a PCIe siwitch (e.g Memory read transaction).

how can I send this Packet ? and how this endpoint knows the other enpoint adrress ?

i need some help please.

thankyou in advance.

Altera_Forum · ‎04-12-2012

Well, there are multiple ways to exchange data between PCIe endpoints, say send data from endpoint 1 to endpoint 2. The easiest one is to route the data through main memory: The device 1 writes the data with a DMA write into main memory (kernel space), next device 2 will do a DMA read from the same memory location to fetch the data. You only have to synchronize the two devices regarding the actual buffer address and order of operation via their drivers in the operating system. If you don’t care for zerocopy, you could let the data pass on to user space and by that separate the drivers for the devices even more.

Another way is to use direct memory writes from device 1 to device 2, or even memory reads from device 2 to device 1. I have never used this technique, so keep your salt at hand. Here, your device 1 has to be informed by its driver of the system memory address of device 2’s BAR you want to use. It might even be possible for device 1 to fetch the BAR settings from device 2 by issuing Configuration Read Requests based on device 2’s ID.

Now device 1 can use this address in a memory write, and the PCIe switches will route the packet to device 2, based on the switch’s knowledge of assigned BARs on its ports (address routing). Device 2 will receive this memory write request just like a memory write triggered by the CPU/driver, and the FPGA application will finish the write based on the indicated BAR hit.

If you want to pull your data by device 2 from device 1, you have to swap the roles: now the driver has to inform device 2 of the BAR address assigned to device 1, so device 2 can fetch memory from there. Device 1 will receive this memory read request and respond with one or more Completions. As the request contained the requester ID of device 1, device 2 addresses the completions to this ID and the switches will now route these response packets based on the ID (ID routing).

The third and final option is, one can use Message TLPs. Messages are typically used only by the IP blocks for Power Management events, Error propagation or Legacy Interrupt indication, but they can be used by the application to let multiple devices communicate. Again, I have never tried that, so I can’t say whether this is fully supported by your favorite (hard) IP block. Different routing mechanisms exist as there are in fact 6 (out of 8) different message TLPs with different routing options applied, identified by the Message Routing Sub-Field. One will typically choose from address routing (001b) or ID routing (010b), and you can choose to send a message with or without data and give it an application-specific Message Code (8 bits).

Messages are always posted, so the proper answer from the other side is not a Completion but another Message TLP, if needed. In such a setup, Message TLPs would be used for everything: data write, return data, signalling, acknowledgment, flow control, error handling. You would try to avoid any other system-bound communication, say, over the OS driver, because this could bring you in trouble regarding transaction ordering. How much of a problem this is in reality might depend on the application. And, again, I have zero experience with that.

Remember that any communication based on ID routing may go severely wrong if the system PCI bus is hit by a rescan and following renumbering. Similarly, if the rescan leads to BAR changes, any address routing will be wrong after that. So the devices should stop talking directly to each other in response to the system starting the rescan, which is typically precluded by a driver unload procedure.

Good luck,

– Matthias

Altera_Forum · ‎04-12-2012

thank you very much Matthias for this information.

I am a beginner in PCIe.

I read the PCIe spec, I also read Altera 's documents (IP for PCI Express Compiler user guide, PCI Epress High Performance Reference Desing ...)

but they just explained the PCIe Hard IP configuration, with the generation of chainning DMA example.

so my question is about the PCIe hard IP interfacing.?if i should use the chainnig DMA example, how can i do that and what steps should I take to send a transaction from an endpoint to another?

because i just configured the PCIe hard IP in Megawizzrd plyg-in and i'm stuck.

if you can send me an example design which shows sending transaction from an endpoint to an other.?

thanks

Altera_Forum · ‎04-12-2012

If you are a beginner, I wouldn’t recommend any device-device transfers at this time, I wouldn’t even recommend trying any DMA access now, because already the simplest operations have a high overhead of designing a complex OS driver. Most beginners start with a QSYS or SOPC builder example, or maybe SGDMA with a ready-made driver, but this will not really help in understanding the PCIe TLPs and driver mechanisms needed for inter-I/O accesses later on.

So if you really want to start with the Avalon ST interface you have to become friends with TLPs. A short overview of PCI Express and TLPs is given here (http://billauer.co.il/blog/2011/03/pci-express-tlp-pcie-primer-tutorial-guide-1/), but there is plenty of free information on the net. Then grab a fresh pdf copy of ldd3 (http://lwn.net/kernel/ldd3/), the Linux Device Driver Development guide.

Next take a hard IP block, configure it, keep the Avalon ST bus open and load it into the FPGA. Boot to Linux and look at what’s happening to your PCI enumeration: Try to getting used to lspci, and correlate the hard IP settings to the output of lspci with varying -vvv options.

A next step could be to use a LED on the FPGA development board and try to make it go light and dark from the operating system driver by issuing PIO write requests from the CPU. Writing and understanding such an initial driver is already much work, but try hard to bring things up this way. The Linux source tree contains quite some complex drivers, but there are small ones available as well. Lean towards a simple PCI-based driver. Don’t skip sections in LDD3, you won’t understand later parts otherwise.

The next thing to try could be a button or switch which is read by the driver, which requires your Avalon ST application to create completions, i.e. read data response packets. printk() is your friend here if you don’t want to create complete character devices in the first attempt. Properly fill transaction descriptor from the request into the completion so that the completion is correlated to the request from the CPU. Note that PIO reads are slow by nature and should be avoided wherever you want your CPU to act in a highly performant way.

A possible next step would be to map an interrupt signal to an Interrupt Service Routine (ISR) in your driver and create the appropriate TLPs or signalling. Remember to correctly issue Legacy INTA or MSI interrupts, depending on the device settings in configuration space. There are quite some interesting things available from configuration space using tl_cfg_* signals, and the Altera PCI Express User Guide will tell you how to get access to the registers.

Before digging further into DMA accesses, I highly recommend to clear your AST TX and RX paths up: Make sure you properly handly IDLE and WAIT conditions and set up your architecture in such a way that it honors the PCIe transaction ordering rules and the very special needs of Avalon ST. For the case of interrupts, you might notice a difference in handling of Legacy INTA interrupts and MSI. Additionally, if not handled properly, the MSI requested after sending out written data might overtake the data and lead to a race condition. Learn how to avoid race conditions and why the ordering rules are your friends here. Ordering rules in hardware are seconded by memory barriers in the driver (http://www.mjmwired.net/kernel/documentation/memory-barriers.txt), so get used to them as well.

The easiest DMA operation is a DMA write access to main memory. Create a PIO register, like the one for the LED, that will be initialized by the driver to carry the address of a reserved main memory location which the PCIe device should write to. Your AST application can then create DMA write accesses to this location so that the memory location reflects your button or switch. See how the data byte has to be put at different AST TX bit positions depending on 3DW/4DW addressing and on the alignment of the given address which also controls the byte enable indication.

Vary the address alignment and try different packet data sizes. Mind the limits of the PCIe spec, i.e. not crossing 4k boundaries, properly handling maximum payload sizes etc. Note that you are only allowed to issue DMA read or write requests if you are allowed to act as a master, which is indicated in the configuration space. Note that most BIOS versions activate DMA on all devices, even if they don’t have to contribute to the boot process. So you should have additional validation like checking for NULL address, i.e. the reset state.

The next step could be to make a simple, repetitive DMA read access in a similar way: Poll a main memory location and make the LED go on and off as a bit is set or cleared by software. Wrap your head about Tags and their proper re-use, ingress completion unshuffling and timeout handling, again TLP ordering rules on RX, and of course the infamous completion credit handling. Again, try different memory read byte counts. Mind the maximum read request size, and learn to handle split read request completions.

When you have come that far, it’s time to talk about descriptors which allow indirect control of address space that can be used for data packet transfers in one or the other direction. The descriptor tables are typically configured over PIO accesses, but your AST application will use mixed read and write accesses to read the descriptors and update them after data sending or reception. One could learn how to gain transfer speed using no-snoop and relaxed ordering.

Different input stream types, addresses, or priorities will make you think about multiple DMA channels so the driver is a little off-loaded in splitting those by the hardware. Different such queues will need different interrupt and flow control handling, so this is the next challenge. Depending on the expected interrupt load, you might consider moving from MSI to MSI-X interrupts now, requiring more PIO-like registers.

At this time, you could watch out to combine two PCIe endpoints over main memory, as described as the »easiest solution« in my former post.

You see, there are multiple steps to take to learn the tasks of a PCIe endpoint, so I wouldn’t recommend jumping straight into the most sophisticated Inter-I/O architecture right away without knowing the PCIe basics.

– Matthias

Altera_Forum · ‎06-15-2012

hello matthias and thank you very much for your replay,

now i have idt pcie switch, i connected the switch board to the computer (windows xp) and i already configured it in such a way to route packets from fpga1 to fpga2 respecting any address ranges.

i followed the steps described in user guide in order to configure the hard ip in the fpgas using megawizard plug-manager or qsys.

but i am stuck in how to write a pcie memory transaction? (for example how to write data to fpga2’s onchip memory)?

any suggestions please!!

thanks in advice.