I have a design that involves using a FPGA for data crunching. I've been developing it on a Cyclone V GT devkit board and using a third-party API (Xillybus) to send data over PCI-Express.Now I need to extend it to work on a budget DE10-Nano board (also Cyclone V, but smaller and with an integrated SoC ARM CPU), which does not have PCI-Express. And, ideally, find a way that works with both boards without forking the code. I've been looking into options. There seems to be several. I can either use JTAG to Avalon MM via the USB-Blaster interface, or get a USB FTDI chip and hook it up to GPIO ports on the board, or possibly even communicate to the board via Ethernet. On a DE10-Nano, I am also able to access onboard DDR3 memory directly from the integrated SoC CPU, and I can write a C program that does the communications for me. (Being able to cache data in DDR3 is a plus.) That's just the ones I could think of, I'm sure there are others. One thing they all have in common is that they are all exceedingly complex and/or slow. Going through the SoC is the only option for which I have sample code, and that pathway is suboptimal (I have to set up a cross-compiler, and the approach would not work with any boards except DE10-Nano.) I have complete FPGA-side sample code for a "simple" design that makes use of DDR3 and exposes Avalon MM via JTAG, but the complexity of it is mind-boggling (512 .v files, mostly autogenerated), just staring at the list of IP components in it makes me sleepy, and I would not begin to know how to port it to DE10-Nano. I tried to go from the ground up and generate my own DDR3 controller IP, I had to type in 30 parameters, it worked for several minutes and then failed, giving me a bunch of errors (most notably: "Warning: ddr3_example.if0.c0: c0.avl_0 must be connected to an Avalon-MM master; Error: ddr3_example.d0.avl/if0.avl_0: Missing connection end (try "Remove Dangling Connections").") In addition to being complex, it seems that some of these approaches may be too slow for me. My task requires throughput of 20 MB/s. The FTDI->GPIO approach seems barely fast enough (40 MB/s) in synchronous FIFO mode. The USB-Blaster pathway was described here https://alteraforum.com/forum/showthread.php?t=34787&page=10 as "not that fast". I would like some suggestions as to the direction in which I should dig.
The least painful may be ethernet. If you have a licence for the core, then you should be able to get PC side drivers for it, it would plug straight into an FPGA side SoC system if needed, or you can just do some packet processing on the FPGA to get the data out. Its not all that hard, its just about learning where all the headers are for Rx, and generating the headers for Tx. What is a lot of work is the testing and verification, as an out of place header will just mean you start losing data (or break the system).What level of processing did you have to do with the PCIe core? were you processing the TLP packets, or did have a MM interface at the FPGA end?
Interesting. What do you mean by PC side drivers: shouldn't I just be able to open a socket on the PC and start sending, especially if the SoC takes care of getting an IP assignment?With the PCIe core, on the FPGA side, I just get two 32-bit FIFO interfaces (read and write), literally 6 signals total. Very easy to handle. Just spent all day working through the Qsys system design tutorial https://www.altera.com/content/dam/altera-www/global/en_us/pdfs/literature/tt/tt_qsys_intro.pdf There's no reference project for Cyclone V, which complicates things a bit ... but I think I almost got it working. Once I have Avalon operational, I'll try to hook it up to ethernet IP and see what happens.