Hipcie type 1. A platform that has a CPU(PCIe root complex embedded in it) connects to the FPGA via PCIe. Inside the FPGA, it has a nios ii processor, dma, memory controller, pheripheral and PCIe etc. So, does the FPGA need to has PCIe root complex or native endpoint? OR it need to connect to non-transparent bridge first and then to PCIe (root complex/endpoint?)? 2. How to justify a root complex is needed? Can you give some examples of application that do not use PCIe root complex? 3. What is the interface of a root complex used to connect to memory? PCIe? If via PCIe, then root complex is connected to endpoint first and then memory, right? pcie bar 1. How to know BAR size? e.g. If i have a DMA connect to PCIe, how to know the block size of DMA? So, i can select the number of bit for the particular BAR. I understand that Qsys and SOPC do it automatically but not megafunction. 2. Does a BAR can be shared by different device? If yes, how it is done? Thanks a lot
I think we just used a PCIe slave (ie not root) for a link to a small ppc.The most useful way to use it was via the PCIe -> Avalon master bridge, this allows a single PCIe BAR to be used to access a lot of fpga peripherals. I think the bridge sets the high avalon address bits to fixed values - we used a 32Mb BAR to access all the io and 16MB SDRAM. One thing worth noting is that the performance of the PCIe slave is not quite what you might expect (think ISA bus speeds) for single cycles. You'll need to use DMA transfers that generate PCIe bursts to get any reasonable throughput. (I've not initiated transfers from the fpga, but I suspect you'll need to use a DMA engine that is closely associated with the PCIe interface.)
pcie type1- The Root complex is only one in the system and can configure all the endpoints. So the FPGA need only to be an end point. In the example you give the bridge is not needed. 2- All application do not need the root complex if there is one in the system. 3- Cannot understand the question.. I suggest that you have a better look at PCIe system prior to your design. pcie bar 1- You can know the BAR size once you need how much memory you want to view from this bar. For example if at that bar you attach a 1MB Dual Port, a 1MB Fifo and 1MB of registers or something else (that can be seen as a sort of register map) the BAR Size need to be the sum of all these. 2- A Bar is a Base Address Register that is a sort of "chip select" moved by the Master of the communication (ie. Root Port) each time it has to access to a memory mapped into it. Note that I speak about memory because the concept is that a under a BAR the Root port see a memory zone.
If you go for DMA (initiated from within your FPGA), you will most likely not use a huge BAR, just enough for accessing all your DMA registers from the system CPU (the one with the root complex). Our project goes well with a single 128 byte BAR.In the beginning of the project, though, PIO accesses might allow you to quickly mock up your communication and get something up and running, as using PIO transfers are much more straight-forward to program than designing DMA engine and driver. Nevertheless, PIO accesses come at a *huge* speed penalty, especially when the system CPU reads from your end point FPGA. Writes are a bit faster, but still unacceptably slow for register-like non-cached BARs. And worst of all: All the time the CPU spends waiting for PCIe reads to finish is *lost*. That means: If the system takes, say, 1 us (microsecond) to get your read access to the FPGA finished, the CPU does not execute a *single* instruction for this 1 us as the CPU has to keep the system in a consistent state. Only DMA with minimum PIO accesses allows PCIe to work at respective rates. In our project with a PCIe x1 link, we had about 1 MByte/s data rate at 100% CPU load in PIO mode (32 bit single accesses to a non-burstable BAR) and 190 MByte/s at 3% CPU in DMA mode. Masters of PCIe register interface design don’t need a *single* PCIe read operation under normal operation (they typically do at initialization time), further reducing CPU load.
I've just failed to find a note of the performance we got :-(There were a lot of problems with the PCIe hardware layer and it doing resyncs all the time. I thought we'd got read and write below 1us. However using host DMA (and spinning waiting for the result) sped the code up significantly as the 'cost per byte' is minimal. I did have to write a driver for the 'ppc 83xx PCI Express CSB bridge' - in itself not for the faint-hearred!