DSP Builder Advanced FAQ

DSP Builder Advanced FAQ

Note: for questions relating to error messages, see DSP_Builder_Advanced_Error_Messages

What do the ChannelIn and ChannelOut blocks do?

The optimizations performed by the tool operate within subdomains of the whole design: individually within each ModelIP block (FIR, NCO etc) and within each Primitive subsystem.

A Primitive subsystem is a Simulink subsystem with a ModelPim SynthesisInfo block, inputs and outputs (at the SynthesisInfo level) passing through the boundary blocks ChannelIn/Out or GPIn/Out at the subsystem I/Os, and containing that part of the design built from ModelPrim blocks. It can contain further Simulink subsystems, but no nested ModelIP blocks or further SynthesisInfo blocks.

The ChannelIn and ChannelOut blocks (and GPIn and GPOut) delimit the boundaries of a primitive subsystem. They group signals (either with (ChannelIn/Out) or without (GPIn/Out) related channel and valid signals) at the boundary to be scheduled together. When determining the pipelining to be added in order to achieve the desired Fmax, the tool needs to know which signals should be kept synchronized, such that adding latency to one will require balancing delays to be added to the synchronous signals. Added pipelining is then added in balanced ‘cuts’ through the synchronized signals, such that they added delay can be corrected for (in most cases) at the subsystem level just by adding simulation delays in the appropriate boundary blocks.

See the DSP Builder Advanced Blockset - Flow Control, Design Style and Floating Point for further details.

If I have a block with data flow and a parameter (e.g. gain) where one single parameter (the gain) is given at any time without any timing relation to the data flow), how should those be used?

If the signals are independent and to do not have to remain synchronous then you can put them through separate boundary blocks.

1/13/FAQ2.PNG ( FAQ2.PNG - click here to view image )

The type of I/O boundary block used determines how the set of signals through it are scheduled during register pipeline insertion. I/O signals through the same I/O block (ChannelIn, ChannelOut, GPIn or GPOut) are scheduled together – i.e. will remain in sync. Using ChannelIn and ChannelOut allows specification of Advanced Blockset protocol valid and channel signals alongside the data. GPIn and GPOut is for other – general purpose – data which doesn’t necessarily have to be scheduled to start or end on the same clock cycle as other I/O signals.

So if you want all your signals to be pipelined such that inputs are all on the same clock cycle and outputs are all synchronized together on the same output clock cycle, use a single ChannelIn and a single ChannelOut block. This is the usual mode of operation. If your subsystem requires inputs appearing on different clock cycles, or outputs grouped on to different clock cycles, you can use multiple ChannelIns and ChannelOuts, or GPIns and GPOuts.

Note that to maintain cycle accuracy at a level outside the primitive subsystem, the pipelining inserted by the tool must be accounted for in simulation. This added latency is calculated by the scheduler and depends on factors such as vector widths, data types, and fmax requirements. So Simulink can only model this after the scheduler has run. Since each pipelining stage is added in slices, or cuts, through all parallel signals, this can be modeled by just adding a latency on the inputs or outputs.

My design worked, I turned on folding for a Primitive Subsystem and got a Simulink error: “S-function '<design>/<subsystem>/ChannelOut' method mdlSetInputPortSampleTime cannot change the sample time of ports once they have been set.”

Simulink has propagation and setting rules for data types, sample rates, etc. that attempt to resolve and fix these fields for each port. Folding changes the Simulink sample rate at which the primitive subsystem runs. If you get this message it’s because there is a conflict in sample time settings: the ChannelOut has been set to run at the folded sample rate, but a block within the primitive subsystem itself has an explicit sample time set on it that conflicts when propagated forwards to the ChannelOut. Check that the sample times of the blocks in the primitive subsystem are set to ‘-1’ (inherit) where appropriate.

How should we use the Avalon Blocks?

The Avalon-MM (ModelBus) and Avalon-ST blocks are used in different ways. Refer to the DSP Builder documentation on how to use these. For flow control the Avalon-ST output “ready” signal should be looped back to the Avalon ST input “ready” signal. This is shown in the diagram below.

2/2e/FAQ7.PNG ( FAQ7.PNG - click here to view image )

Is it possible to have portion of the graph depending on some variables? e.g. having clockrate/N adders, where clockrate is defined in the parameter file.

There are several ways to do this. The first is through the use of vectors, where the vector size determines the number of blocks that will be produced. Vectors are very useful in building parameterizable components. The other is to create a self initializing subsystem component – see 4.2.2.1 of DSP Builder Advanced Blockset - Flow Control, Design Style and Floating Point for an example of a block which is really a self initializing subsystem.

How do we initialize the value in a register (SampleDelay)?

This is not currently supported directly. Delays specified by SampleDelays can get redistributed around the system – and hence implemented as registers in memory blocks or multipliers where initialization is impossible.

What is the list of supported Simulink blocks that can be use for HW generation?

Mux, Demux, From, Goto, Subsystem ports (Out1, In1), Terminator, Constant, Selector (static Vector selection only), Complex to Real-Imag, Real-Imag to Complex, Configurable Subsystem (with some restrictions), Data-type propagation (with some restrictions). Bus Creator and Bus Selector (bus signals can be used in routing but not through blocks).

Can I mix VHDL with Simulink (or other design languages?)

The HDL Import block can be used in a Standard Blockset level hierarchically above the DSPBA design. See documentation on HDL Import and mixed Blockset designs.

Can I create my own equivalent of the ‘Edit Params’ block

Yes. The Simulink documentation covers such matters.

It is deliberate that we do not show the Edit Params block in DSPBA library browser: the block itself does nothing other than open a file for editing. The user would have to create the script and set up the pre-load functions on the model propertites to use correctly. It is not something you can just drag and drop onto your model.

You can achieve the same by creating any m script which is run in the models set-up stage (PreLoadFcn), or indeed any other such stage if necessary.

The Edit Params assumes that the name of this script is “setup_<model name>.m” – but this is just the way it’s done for this use case – you could call it whatever you like (e.g my_script.m)

Use File > Model Properites > Callbacks to get your script to run before simulation. For a design demo_duc using edit params you have to add setup_demo_duc to the PreLoadFunc (so that the parameters exist on loading, and in the InitFnc, so that changes you make with the model open before running simulation will be included in the simulation run.. (If you called the script my_script.m, then just put my_script; at the appropriate stages.

3/33/FAQ12a.PNG ( FAQ12a.PNG - click here to view image )

The edit params block is a masked subsystem that has been given an OpenFnc to open an m-script “setup_<model_name>” so you can edit it

s = sprintf('edit setup_%s', eval('gcs')); <-- this bit is setting up the name of the script in a ‘edit’ command

eval(s); <-- this bit is executing the command

Drop in a Simulink subsystem. Go in and remove the default ports. Back out, right click ... block properties … and set the OpenFnc

6/61/FAQ12b.PNG ( FAQ12b.PNG - click here to view image )

Alternatively if you called you set-up script ‘my_script.m’ the OpenFnc would be …

c/c4/FAQ12c.PNG ( FAQ12c.PNG - click here to view image )

The block will now open the script for editing when clicked, and will look like this:

c/c8/FAQ12d.PNG ( FAQ12d.PNG - click here to view image )

You can call this subsystem block what you like … or hide the name it doesn’t matter. All it is is a way of opening the set up script for editing. The important thing is the OpenFnc.

You can even add a picture or graphic for it. For example if you have a picture “ant.jpg” in a directory which is included in you matlab path (file > set path …) then you can right click on the subsystem block > Edit Mask … and add something like the following (which sets an image, sets the text color to white, and writes “Antonnios Set Up Script” across it)

e/ed/FAQ12e.PNG ( FAQ12e.PNG - click here to view image )

To give

e/ee/FAQ12f.PNG ( FAQ12f.PNG - click here to view image )

You can debate whether this is an improvement.

Is it possible to restrict the scope of the variables defined in the setup script to the model they apply to only?

The recommendation is to create a structure of variables for the model to avoid ambiguity if running multiple models. The Simulink help also has some information on the scope of workspace and model variables.

What is the Valid-Channel-Data protocol?

The protocol used throughout DSPBA designs is a bus of three synchronized signals for specifying multi-channel data - valid, channel and data.

Valid (ufix(1) or bool)) indicates whether the concurrent data and channel signals have valid information (1) or are unknown or 'don't care' (0).

Channel (uint8) is a synchronization counter for multiple channel data on the data signal(s). It increments from 0 with the changing channels across the data signal(s) within a frame of data

Data signal(s) can be any number of synchronized signals carrying single or multi-channel data.

Data on the data wire is only valid when DSP Builder asserts the valid wire 'high' (1). During this clock cycle, the channel carries an 8-bit integer channel identifier. DSP Builder preserves this channel identifier through the data-path so that you can easily track and decode data. Within a frame of data the channels fill the available time-slots along one wire before spreading to separate wires. This is illustrated below with some examples.

In your design you will have a clock rate N (MHz) and a per-channel sample rate M (Msps). If N=M then you are getting one new data sample per channel every clock cycle. For a single channel design this would look like:

Valid: < 1 >< 1 >< 1 > ...

Channel: < 0 >< 0 >< 0 > ...

Data: <s00><s01><s02> ...

where sPQ = the Qth data sample for channel P. The 'frame length' - the number of clock cycles between data updates for a particular channel is 1, so out channel count starts (from zero) every clock cycle.

For a Multichannel design this would look like:

Valid: < 1 >< 1 >< 1 > ...

Channel: < 0 >< 0 >< 0 > ...

Data: <s00><s01><s02> ...

<s10><s11><s12> ...

<s20><s21><s22> ...

Note now the data is spread across multiple wires. Note that the even though we have multiple channels again the frame length is 1, so the channel signal number - which can be thought of as a channel synchonization counter rather than an explicit number expressing the actual channels - is again zero on each clock cycle.

Now suppose N > M for a single channel design. Now we would receive new data samples only every N/M clocks. For example if N = 300MHz and M = 100Msps we would have new data every 3 clock cycles. We don't care or don't know what the data is on the intervening clocks, so set the valid to low (0).

Valid: < 1 >< 0 >< 0 >< 1 >< 0 >< 0 >< 1 >< 0 >< 0 > ...

Channel: < 0 >< X >< X >< 0 >< X >< X >< 0 >< X >< X > ...

Data: <s00>< X >< X ><s01>< X >< X ><s02>< X >< X > ...

where X stands for 'Unknown' or 'Don't Care'. Here we would say that the 'frame length' is 3 as there is a repeating pattern of channel data every 3 clock cycles. Now suppose we have the same N = 300MHz and M = 100Msps, but now 2 channel data. The data wire carries the sample for the first channel the data for the second channel, then a cycle of 'don't care':

Valid: < 1 >< 1 >< 0 >< 1 >< 1 >< 0 >< 1 >< 1 >< 0 > ...

Channel: < 0 >< 1 >< X >< 0 >< 1 >< X >< 0 >< 1 >< X > ...

Data: <s00><s10>< X ><s01><s11>< X ><s02><s12>< X > ...

Note how the channel signal is now incrementing as we recieve the different channel data through the frame. If we have three channels of data at the same rates, then the frame is full along the single data wire

Valid: < 1 >< 1 >< 1 >< 1 >< 1 >< 1 >< 1 >< 1 >< 1 > ...

Channel: < 0 >< 1 >< 2 >< 0 >< 1 >< 2 >< 0 >< 1 >< 2 > ...

Data: <s00><s10><s20><s01><s11><s21><s02><s12><s22> ...

Now suppose we have a forth channel. The data now spreads across multiple data signals as one wire is not going to be enough to transmit 4 channels of data in 3 clock cycles. Note the format of the ordering. DSPBA attempts to distribute the channels evenly on the wires that it has to use.

Valid: < 1 >< 1 >< 0 >< 1 >< 1 >< 0 >< 1 >< 1 >< 0 > ...

Channel: < 0 >< 1 >< X >< 0 >< 1 >< X >< 0 >< 1 >< X > ...

Data: <s00><s10>< X ><s01><s11>< X ><s02><s12>< X > ...

<s20><s30>< X ><s21><s31>< X ><s22><s32>< X > ...

Now add a fifth, still keeping the same clock and channel data rates. The data again spreads across 2 data signals - required to transmit 5 channels of data in 3 clock cycles. Note the format of the ordering again. DSPBA packs the 5 channels of data as 3 on the first wire and 2 on the second.

Valid: < 1 >< 1 >< 1 >< 1 >< 1 >< 1 >< 1 >< 1 >< 1 > ...

Channel: < 0 >< 1 >< 2 >< 0 >< 1 >< 2 >< 0 >< 1 >< 2 > ...

Data: <s00><s10><s20><s01><s11><s21><s02><s12><s22> ...

<s30><s40>< X ><s31><s41>< X ><s32><s42>< X > ...

Note that the channel signal still counts up from 0 at the start of each frame and that it is specifying a channel synchonization count, rather than expressing all the channels received on a paricular clock (which would require as many channel signals as data signals). Note also the the valid signal also remains 1-dimensional. This could theoretically under-specify the validity of the concurrent data if for example in a particular frame channel 0 is valid but channel 3 (received on the dsame clock) is not. In the example above we have data for channel 2 on the first data signal recieved at the same time as a the 'don't care' (invalid) data on the second data signal. Some knowledge of the number of channels being transmitted is therfore required to decode this.

Now suppose we have N < M. This means we are receiving multiple (M/N) data samples for a particular channel every clock cycle. We call this 'super-sample' data. For example if N=200MHz and M=800Msps we would have for a sinle channel

Valid: < 1 >< 1 >< 1 > ...

Channel: < 0 >< 0 >< 0 > ...

Data: <s00><s04><s08> ...

<s01><s05><s09> ...

<s02><s06><s0A> ...

<s03><s07><s0B> ...

with 4 new data samples every clock.

General Rules for Channel Packing Within a Frame

The Frame Length or Period = ClockRate / SampleRate
DSPBA will use the minimum number of wires possible to transmit the channelized data within the frame period. Data Wire Count = ceiling (Number Of Channels / Period);
Once the Data Wire Count is calculated, DSPBA will attempt to distribute the channels evenly acorss the wires - starting from the first wire and from the first clock cycles in the frame.

Further Examples

In the follwing examples, only the data wires are shown and only for one frame of data. cN refers to data for channel N, and "--" expresses an unused or "don't care" slot.

8 channels in a 6 clock-cycle frame

9 channels in a 6 clock-cycle frame

10 channels in a 6 clock-cycle frame

11 channels in a 6 clock-cycle frame

13 channels in a 6 clock-cycle frame

Support for data-types wider than 128 bits?

Simulink supports fixed point data-types up to 128 bits in length. However, DSPBA provides support to go beyond this for certain primitive block, using a custom signed or unsigned integer format.

An example is given by Media:Demo_wide_clz.mdl

Most blocks that support fixed point allow this type, but obviously Simulink’s built-in blocks such as vector mux/demux and real/imag, will not.

Primitive blocks that do not support wide custom types are:

Counter, Loadable Counter, and Loop blocks - which are limited to 32-bit wide output
DualMem blocks - which are restricted to 64-bit wide output.
Any floating point only blocks

ModelIP blocks also do not support custom wide types, and have output restricted to 64-bits wide.