06-20-2012 11:46 PM
Sirs,I am looking for any tips and tricks as for how to process unaligned, variable-length data in FPGAs. Basically, the data is coming from 10G MAC through Avalon interface with 64-bit data width. Then UDP payload is extracted. In a single UDP payload, there are many many smaller "messages" that should be passed further through Avalon ST with "its own" beginofpacket, end of packet etc. The basic format is like this: First 16 bit of each message is its payload size in bytes (not including these 2 bytes of size). Then goes payload. Right when payload ends the next message starts. Unfortunately, there is no guarantee for data alignment - message length can be 1, 2 or 1K bytes. Since I have never done anything like this before, I have many questions. First question is - is there any existing IP, whitepapers, reference designs, or anything that help on the way of designing a module like this? The main issues that I have is.... put it simple - I don't know how to implement it best. All my idea smell so bad I can't stand them myself. My thoughts are these: In order to process a single 64-bit chunk, I have to keep track of chunks that I still need to send. Than take, say, modulo of division by 8 to figure out offset where to read next size. Since I have to perform multiple operations, I have (or I think I have) two choices - first is set "ready" flag in the sink interface to 1'b0 in order to stop data flowing in. Then take my cycles to do math on number and pass message further. But I am not sure if it is generally an acceptable practice to "pause" a source to sink data flow in the middle of receiving a packet. But even it is - I dislike this way very much because it literally kills the throughput. So if I don't pause, I will have to do pipelining. Getting more data in while I don't really know what to do with it. But then comes a good question - what if I got, say, 5 samples into my sink, during that time I've done my calculations and it turned out that message size was 1 byte and now I have to go back to the first chunk and start all over. Another point is this - say I am trying to align 5 chunks of 64-bit samples with 5 steps of calculations I need to do. What if... source actually desserts "valid" signal on some stages? Should I also check for "valid" on every math operation and... waste a cycle if source is not valid? Please help. Any thoughts, hints, links, criticism is highly appreciated.
06-21-2012 01:28 AM
This is what I understood from your text:You already have the block to receive the packets and remove UDP encapsulation; these blocks give you the UDP payload via a 64 bit wide, Avalon-ST interface. Now, you need to break that UDP payload into smaller packets and then feed it into another Avalon-ST sink. Here's my suggestion: Having to pause the flow while you breakdown the 64 bit words in unavoidable. But you can do it while keeping up with the throughput. 10Gbit/s over 64 bit means your source will give you, at full throughput, ~156Mword/s. If you feed the 64 bit words to a FIFO and if the logic that reads and breaks the messages works at more than 156 MHz, then you have extra time to compensate the pauses. The same kind of goes for the output. If the breaking logic doesn't have data, it has to pause. But if runs at a higher rate, it can make up for the pauses. And if needed, you can again use FIFOs to match the rates.
06-21-2012 01:59 AM
rbugalho,You understood my problem correctly. Thank you very much for the response. It clarifiers things a lot. The smoke was coming out of my head thinking if that's OK to pause or not and if there is a way not to do so or not. The user logic, I think, will be running at around 300, maybe 330 MHz. The data will be coming from the double-clock FIFO. Most of the time data rate will be a lot smaller, but I am hoping for the best while preparing for the worst. I guess I will just sketch up a simple state machine with minimal pipelining first (to keep it simple) that pauses the sink interface and does what it needs to do. Then describe timing constraints and try to synthesize it. If the timing will not be met, will introduce more pipeline stages until I achieve a logic that can handle the maximum load. Thank you very much for your response!