Re: pipeline bridge bursting and clock crossing bridge fifo length

Altera_Forum · ‎10-12-2011

Hello,

I'm new to Qsys/SOPC Builder and I'm currently developing a Nios II system and have problems to understand what the effects and the dependencies of the mentioned features are.

The Clock Crossing Bridge offers a FIFO length for Master-to-slave and Slave-to-master transfers. How can I determine the optimal setting of the FIFO length? Are there any prerequisites to setup the FIFO on the slave site for the different values or do higher values simply lead to more resource usage on the FPGA?

The Pipeline Bridge supports burst transfers which can be enabled for the component. What would happen if I connected a slave which doesn't support burst transfers or which doesn't support the maximum number of burst transfers set for the Pipeline Bridge component?

The typical settings seems to be that a fast component is connected to a Pipeline Bridge. The Pipeline Bridge is then connected to a Clock Crossing Bridge to which a slow component is connected:

fast <-> pipeline bridge <-> clock crossing bridge <-> slow

The Pipeline Bridge is not necessary but looking at the design examples it seems to be recommended to use one. Is there a rule for when to use a Pipeline Bridge?

Sincerely

Martin Stolpe

Altera_Forum · ‎10-12-2011

Larger FIFO depth will increase the on-chip memory utilization. If you increase the FIFO in small steps you might find the same number of RAM blocks are used. When you increase the command queue the read queue must be increased too since it'll need more support more posted reads in flight. Picking those depths is tricky since it's system depended, without knowing what you are doing I can't comment on it.

When you have a burst mismatch between the master and slave the tools will adapt it. For big to small burst it'll chop the burst up. For small to big it'll pad the upper burst bits to 0 so that the burst width matches the end point.

Try designing without pipeline bridges then add them when needed. I use pipeline bridges when I have the following ratios to pipeline the fabric logic to meet timing:

many masters connected to a slave

master connected to many slaves

I recommend reading the document titled something like "Avalon optimization guide". Those ratios are covered in more detail about what features to enable. Note there is another guide that is Qsys specific so make sure you grab the right one for whatever tool you are using. If you are getting started on a project you might want to try out Qsys.

Altera_Forum · ‎10-12-2011

Just to make sure I understood it correctly: If the burst count width of the master is greater than the burst count width of the slave the tool will generate logic so that the burst transfer is automatically split to smaller burst cycles which the slave can handle?

The system I'm working on is a NiosII processor system. It's based on the Ethernet example for the Embedded Systems Development Kit, Cyclone III Edition. Ideally the target clock frequency of the system would be 125 MHz for the processor. We plan to replace the DDR2 memory with SRAM. There are some peripherals connected which clock at 125 MHz and 62.4 MHz.

The first step for implmenting a system would be to import the needed modules into and add clock crossing bridges where needed. The synthesize the design and look for timing requirements which were not met and then add Pipeline Bridges where the timing requirements were not met?

Unfortunately I wasn't able to finde the document you mentioned. Can you tell me where I can find this document? I am working mostly with the following two documents right now:

Section II. System Design with Qsys of the Quartus II handbook and

Avalon Interface Specifications

Altera_Forum · ‎10-13-2011

That's correct. If you had say a burst of 8 master accessing a slave that only handles a max burst of 2, a burst adapter will be created in the fabric for you. The adapter will chop the burst into four individual bursts of 2. All of this is transparent to the master and slave.

I would recommend just placing main memory, Nios II, and Ethernet on the fast domain then all your slow peripherals can be placed behind a clock crossing bridge. Then if you have timing failures then you can evaluate placing pipeline bridges into the data path of the faster clock domain.

I'm on a mobile device so it is a bit clumsy to link you to the doc. Go to the Nios II literature page and look for the embedded design handbook, one of the chapters talks about optimizations for MM.

Altera_Forum · ‎10-26-2011

Hi,

thanks a lot for the design advices and the hint with the embedded design handbook! It's good to have an guideline on how to start and refine a system.

Altera_Forum · ‎11-09-2011

Hi,

how do you find out what the max burst size is for components? For DDR2 high performance controller there is an option to choose max burst size, but for onchip-RAM or SDRAM controller, what is the maximum burst size? I'm trying to get rid of all burst adapters since they are on my critical path.

Thanks for your help!

Altera_Forum · ‎11-10-2011

On-chip memory doesn't support bursting (don't make sense to burst into on-chip RAM). I assume you are talking about the old SDR SDRAM controller listed in SOPC Builder/Qsys? If so it doesn't support bursting either.

Altera_Forum · ‎11-11-2011

Thanks for your reply.

I was talking about the SDRAM on the DE2 board and listed as SDRAM controller in SOPC (which I think is the same as the one you mentioned?)

When you say that onchip-RAM and SDRAM do not support bursting, do you mean that the readdata isn't returned in consecutive cycles after the initial delay (hence essentially same as requesting for data separately), or that burst adapters will be created in order to support bursts?

The reason I ask is because in simulation I do see that readdata IS being returned in consecutive cycles after the initial delay.

Thanks for your help!

Altera_Forum · ‎11-11-2011

A memory doesn't have to support bursting to be able to return data every clock cycle. Bursting is meant for interfaces that do not perform efficiently without back to back sequential data. The SDR SDRAM controller was designed to avoid the need to burst offchip; however, if you connect many masters up to it you might want to increase their arbitration shares to improve your memory efficiency. So if you are using the SDR SDRAM and on-chip RAM components there is no reason to enable bursting on any of your masters.

Altera_Forum · ‎11-14-2011

So do you mean that for SDR SDRAM (is this the same as the one listed as "SDRAM Controller" in SDRAM section of SOPC builder?), you would get the same memory bandwidth if you burst for 10 read transfers, or if you do 10 read transfers one after another? Doesn't one SDRAM read take tens of cycles?

EDIT: I guess you are suggesting me to use pipelined transfers instead of bursts. What I experienced to be the problem with pipelined transfers is that, let's say I have two master components which will have DMA to SDRAM. Both will use pipelined transfers and both are equally important to have high bandwidth so I can't assign more arbitration shares to one over another. When both components post pipelined reads to SDRAM at the same time, Arbitration alternates between the two components (granting first read to component 1, then second read to component 2, then third read to component 1 again.. etc) , hence causes a VERY long latency to occur to both components. Is there a way to for Avalon masters to gain arbitration lock, without using bursts??

Thanks for your help!

Altera_Forum · ‎11-14-2011

One more question if you don't mind answering,

What is the exact burst length of DDR2 SDRAM Controller with ALTMEMPHY when using High Performance Controller II? In SOPC Builder after instantiating DDR2 SDRAM Controller with ALTMEMPHY, in the Controller Settings tab, I set "Local Maximum Burst Count" to 8, "Memory Burst Length" is also set to 8 beats under Memory Settings tab inside "Modify Parameters", and to match this my Avalon burst count signal to the DDR2 Controller is 3 bits wide.

There is still 1 burst adapter being instantiated by SOPC. Do you know why this is happening?

p.s. Altmemphy documentation and design tutorial seems to suggest that that burst length is 8 for High Performance Controller II, but the burst adapter still gets created...

Altera_Forum · ‎11-14-2011

That's correct, whether you use a burst read of 10 beats or just sequentially read from 10 addresses in a row you'll get similar performance. In fact in SOPC Builder has a single dead cycle at the beginning of each burst that can eat into your performance when performing show bursts so when you don't need it you are better off not burst (besides the reduced complexity of having burst adapters all over the place). For something like on-chip RAM which doesn't support bursting this can cut your throughput down by as much as 50% if you were performing bursts of 2 into it (every transaction would take two cycles due to the extra dead cycle caused by the burst adapter).

One thing you need to take care of though is to increase the arbitration share of the masters hooked up to the SDRAM when not using bursting as the round robin arbitration defaults to equal fairness of one access per master. For the Nios II masters I typically use an arbitration share of 8 when accessing a x32 SDRAM to match the cache line size (assuming 32 byte/line data cache).

The high performance 2 controller has two burst settings, one for the offchip memory and the other for the local side (Avalon slave port). You can get away with setting the local burst count to 1 (non-bursting) and rely on the memory controller to glue multiple transactions together to form an offchip burst. Again if you do this then you should tweak your arbitration share since if a bunch of masters are fighting over the memory at different addresses then the memory controller most likely will not be able to buffer up enough accesses per master to perform the transaction merging I referred to earlier.

The burst adapter you are seeing might be caused by the burst wrapping nature of the Nios II instruction master. It performs critical word first filling whereas the memory controller does not (internally it supports burst wrapping though but not the Avalon interface). If you disable bursting in the CPU and set the memory local burst count to 1 and tweak the arbitration shares you'll get the same functionality as having the burst adapter. The burst adapter would take the CPU burst and chop it up into non-wrapping bursts.