In this paper we will discuss the concept of using the Avalon ST standard interfaces within SOPC Builder to construct a self paced computational data path which can be applied to various algorithms. Repetitive and iterative signal processing type algorithms can be adapted to this methodology quite easily.
We assume that the reader is already familiar with the characteristics and operation of the standard interfaces used within an SOPC Builder system. For more information on SOPC Builder and it’s standard interfaces please refer to the “Quartus II Handbook Version 9.1 Volume 4: SOPC Builder” and the “Avalon Interface Specifications” literature from the Altera website. The examples in this paper are primarily driven by the Avalon ST streaming point to point protocol and they utilize the back pressure, forward enablement and channelization capabilities within the Avalon ST protocol, but they do not leverage any of the packet signaling capabilities of the protocol.
The example that follows is contrived by a simple signal processing filtering type of algorithm which consists of computing the double precision floating point dot product of an input vector multiplied across an input matrix. On the following pages you will find the MatLab equations that are used to define this matrix manipulation along with some example C program implementations to provide a context of the required operations within this algorithm. From this programmatic representation, we then extract a data flow diagram that illustrates the step by step data movement of the input operands flowing through each computational operation to produce the intermediate operands which are further processed to produce the final result. This whole data flow represents one iteration of the dot product operation required by the algorithm, so consecutive iterations which pass each vector of the matrix through this data flow will produce the final scalar result that is desired by the algorithm.
Once the data flow diagram has been established it should be fairly easy to see how this structure can be mapped into a simple multiplexed data path, and how the repeatedly used operations for multiplication and addition can be condensed into a single resource for each operation which is then reused by the data path in a time domain multiplexed fashion to compute all of the required operations in the algorithm. At this point the diagram gets a bit more complicated as we attempt to illustrate how this data flow requirement is mapped into the point to point multiplexed structure of the Avalon ST architecture, but it should become very apparent that this construct is nothing more than a data path which multiplexes operands into the shared multiplication and addition resources and then demultiplexes the results of these operations back into the data path for further processing. The multiplexing and demultiplexing of this data path is all based on the standard SOPC Builder components provided for this activity and the channelization which accumulates through the natural Avalon ST multiplexing activity is what we use to then demultiplex the data path into each consecutive operation.
A detailed explanation is provided for each stage of the Avalon ST data path diagram to bring insight into the more trivial details that are required by the implementation to deal with the real world requirements of deploying the data path. Things like data formatting translations and other aspects of the architecture will be explained so that it’s clear how the double precision floating point operands are passed from the external system into the data path pipeline and then back out into the external system.
Points of Interest
As you read through this paper, it may seem like a very complex example to digest and you may not be intimately familiar with the Avalon ST concepts and constructs that it is built upon, but in the end this example is actually trying to illustrate some very simple concepts. So keep these thoughts in mind as you read through the details of the rest of this paper:
1 – The data path which is defined in this example is nothing more than a simple multiplexed and demultiplexed data stream.
2 – The double precision floating point cores that are deployed in this example are rather large IP blocks which can consume a lot of valuable FPGA real estate, however they also provide an enormous amount of computational bandwidth. So in order to make the most of this available computational bandwidth, this example is simply time sharing the multiplication and addition requirements of the algorithm through one instance of each of these resources.
3 – Since this architecture is put together using standard Avalon ST components within the SOPC Builder environment, the architecture can be easily reconfigured to modify the algorithm itself, or even add new algorithms to the existing data path to consume the available bandwidth of the double precision floating point elements to produce additional computational work output.