Re: SDRAM controller: Altera High Performance vs. Microtronix Multi-Port

Altera_Forum · ‎09-30-2008

We are in a process of choosing which one to use: the Altera High Performance SDRAM controller or the Microtronix Avalon Multi-Port SDRAM controller.

Please advice, in the following factors:

- performance: which one is faster?

- easy to use: which one is easy to use?

- documentation: which one has better documentation?

- reference designs: which one has better ref. designs?

Thank you in advance

Altera_Forum · ‎09-30-2008

What family are you targeting?

The high-performance core should run the fastest(it re-calibrates it's receive path to align with the data). The Microtronix has some very nice multi-port features that I've seen people want. That's probably the biggest trade-off. (The Altera core has one port, and you need to do your own muxing if multiple components are accessing it. SOPC Builder can do this for you, but it's all pretty direct.)

For the ease of use, I've seen people use both and have no problems and say it was really easy. But when something goes wrong, I've seen people struggle with both, so the bottom line is that it depends. I'm not sure about the other two questions...

Altera_Forum · ‎09-30-2008

Hi Rysc,

Thanks for a prompt response.

I use Cyclone III (the C120 speed grade 7).

There are about 7 masters which connect to a single SDRAM controller.

Need more ideas.

Thanks

Altera_Forum · ‎09-30-2008

Does Microtronix support the size and speed of memory you want to interface to? Do they support 7 ports(not that you couldn't expand some of the ports). Your 7 masters are going to have some system requirements that you need to understand(one may need low-latency, one may need priority, one may need large on-chip buffers, one may not need anything but to stay out of the way, etc.). I would compare that to what Microtronix offers, and maybe give them a call to discuss.

You also may want to describe your requirements and post to niosforum, as you'll probably get better responses on how SOPC builder adaptors and the avalon interconnect can help your system(I'm a novice on the SOPC stuff). Of course, hopefully others on this forum reply too.

Altera_Forum · ‎09-30-2008

I do not have experience with the Microtronix version of the controller. However, the Altera High Performance controller has been very easy to use. Using SOPC builder you could easily do a 7-port implementation (I've done many more than that) with little effort on your part.

Alternatively I don't expect it would take you long to come up with your own arbitration logic scheme if you were to do it outside SOPC builder.

As far as performance. That depends primarily on how efficiently you access the SDRAM. I have at times achieved up to 95% efficiency on the Altera controller.

Jake

Altera_Forum · ‎09-30-2008

--- Quote Start ---

Does Microtronix support the size and speed of memory you want to interface to? Do they support 7 ports(not that you couldn't expand some of the ports). Your 7 masters are going to have some system requirements that you need to understand(one may need low-latency, one may need priority, one may need large on-chip buffers, one may not need anything but to stay out of the way, etc.). I would compare that to what Microtronix offers, and maybe give them a call to discuss.

You also may want to describe your requirements and post to niosforum, as you'll probably get better responses on how SOPC builder adaptors and the avalon interconnect can help your system(I'm a novice on the SOPC stuff). Of course, hopefully others on this forum reply too.

--- Quote End ---

Thanks Rysc. I appreciate your replies very much.

There is no problem with the size and speed of the DDR2 I am going to use. The Microtronix's support up to 6 ports, 2 of them will be used for the Nios 2 processor (instruction, data) which is expected to consume a little bandwidth (around 5% of the DDR2's peak bandwidth.) The 4 ports remained will be shared between 7 masters which are expected to consume 40% of the DDR2's peak bandwidth. These kind of connections should be easily done by the SOPC builder.

The 7 masters are:

- a write/read master for 720x480x24bit @ 60 fps video buffering (let label write m1, read m2)

- a write/read master for 1920x1080x24bit @ 60 fps video buffering (let label write m3, read m4)

- 2 read masters for 640x480x24bit @ 60 fps video reading (let label them m5 and m6)

- a read master for 800x480x24bit @ 60 fps video reading (let label it m7)

m1 to m4 are masters of encrypted IPs which I do not understand well how they are scheduled.

m5 to m7 are created by myself so I can fully control them.

These 7 master should have the same priority. I have to somehow control the overall system scheduling. I don't have an idea of how to do right now. Please advice if you have experience dealing with time sharing systems.

I may post to Nios forums to direct gurus there to this thread. Thanks for reminding me of a very good resource.

Altera_Forum · ‎09-30-2008

--- Quote Start ---

I do not have experience with the Microtronix version of the controller. However, the Altera High Performance controller has been very easy to use. Using SOPC builder you could easily do a 7-port implementation (I've done many more than that) with little effort on your part.

Alternatively I don't expect it would take you long to come up with your own arbitration logic scheme if you were to do it outside SOPC builder.

As far as performance. That depends primarily on how efficiently you access the SDRAM. I have at times achieved up to 95% efficiency on the Altera controller.

Jake

--- Quote End ---

Hi Jake,

I am considering arbitrating the masters myself but I have no experience with this kind of task. Would you please tell me more about your experience? How did you share the bandwidth between masters to achieve the system performance and function. Mine is a video processing system from/into which video data is streamed.

Looking forward to hearing from you.

Altera_Forum · ‎10-01-2008

I currently have a design that uses 8 Avalon masters and is a video application. The design performs video frame buffering for 4 video feeds. Each of the video feeds can vary in input format between 720x480@60fps to 1920x1080@60fps.

There are two issues at hand:

1 - You need to dedicate enough time to each master to allow efficient access of the DDR2 memory. If you just write small chunks of data and constantly switch between masters, your DDR2 efficiency will suffer and you won't be able to satisfy your bandwidth requirements. This is due to the memory controller having to switch between pages within the memory.

2 - You need to provide enough local buffering on each of the masters so that they don't overflow or underflow while the other masters are accessing the memory.

I have done this three different ways:

1 - SOPC builder allows you to specify priority and arbitration shares for each of the masters. You can use this to ensure that each master is guaranteed a certain number of accesses on the bus (assuming the master needs them).

2 - You can create your avalon masters to support bursts. When using bursts, the master specifies how many transfers it wishes to make. No other masters are granted access during that time.

3 - You can create your own arbitration. Mine is quite simplistic. Round Robin scheduling between the masters. Each master is guaranteed a minimum amount of transfers unless it doesn't actually need them. If none of the other masters have something to say, a master may continue to occupy the bus. This is somewhat self regulating as the DDR2 access becomes more efficient as the loading increases.

The local buffering of course is usually done using FIFOs. One thing you need to do is ensure that your data transfers take full advantage of the width of your memory interface. Do not do 32-bit data transfers if your DDR2 memory is 128 bits wide. You're just killing your efficiency otherwise.

Your most inefficient accesses will be your NIOS master. Give it as large of a data cache as possible. Consider using other memory for the NIOS rather than your buffering memory.

Don't know if this helps but good luck.

Jake

Altera_Forum · ‎10-01-2008

Hi Jake,

Sorry for my belated reply.

--- Quote Start ---

There are two issues at hand:

1 - You need to dedicate enough time to each master to allow efficient access of the DDR2 memory. If you just write small chunks of data and constantly switch between masters, your DDR2 efficiency will suffer and you won't be able to satisfy your bandwidth requirements. This is due to the memory controller having to switch between pages within the memory.

2 - You need to provide enough local buffering on each of the masters so that they don't overflow or underflow while the other masters are accessing the memory.

--- Quote End ---

How did you deal with the issue no.2? I don't have other memory resources except for a single DDR2 and FPGA on-chip memory. On-chip memory is the only option for local buffering in my case.

--- Quote Start ---

I have done this three different ways:

1 - SOPC builder allows you to specify priority and arbitration shares for each of the masters. You can use this to ensure that each master is guaranteed a certain number of accesses on the bus (assuming the master needs them).

2 - You can create your avalon masters to support bursts. When using bursts, the master specifies how many transfers it wishes to make. No other masters are granted access during that time.

3 - You can create your own arbitration. Mine is quite simplistic. Round Robin scheduling between the masters. Each master is guaranteed a minimum amount of transfers unless it doesn't actually need them. If none of the other masters have something to say, a master may continue to occupy the bus. This is somewhat self regulating as the DDR2 access becomes more efficient as the loading increases.

--- Quote End ---

I will try one after another. This is the first time I heard about 'priority and arbitration shares' between masters within the SOPC builder. Although it's self-explanatory, I will read the SOPC builder handbook...

There is a customizable scheduler comes with the Microtronix SDRAM controller. Its default arbitration scheme is Round-Robin. The Microtronix's is becoming my choice provided that it's easy to simulate before being implemented into a real hardware.

--- Quote Start ---

The local buffering of course is usually done using FIFOs. One thing you need to do is ensure that your data transfers take full advantage of the width of your memory interface. Do not do 32-bit data transfers if your DDR2 memory is 128 bits wide. You're just killing your efficiency otherwise.

Your most inefficient accesses will be your NIOS master. Give it as large of a data cache as possible. Consider using other memory for the NIOS rather than your buffering memory.

--- Quote End ---

Related to my first question in this post, what memory resources do the FIFOs utilitize: external resources or FPGA on-chip resources (on-chip memory, FPGA LEs, etc.)?

I am considering creating burst masters with data width of 128 while the DDR2 is 64 bit wide.

About the Nios, I might have to use on-chip memory. The biggest issue here is that there may be not enough on-chip mem. for both masters local buffering and Nios's program memory.

--- Quote Start ---

Don't know if this helps but good luck.

--- Quote End ---

I don't have enough words to say THANK YOU. I was relieved to hear from you.

Avtx30

Altera_Forum · ‎10-01-2008

Yes the FIFO's are usually created with onchip memory. Typically in a hefty video processing application, onchip memory and DSP blocks become precious items. They are usually the limiting factor in the design rather than logic resources.

Jake

Altera_Forum · ‎10-01-2008

Thanks Jake and Rysc. I will be back.

Altera_Forum · ‎10-28-2008

Hi,

I am also using the 3c120 board for a video processing application and I am facing the same bandwidth issues.

I have split the DDR2 in two separate 32-bit memories. Nios has full access to one while the other is shared by the video processing blocks.

9 masters access the video memory having burst size 32. The arbitration is set at 32 for each master.

The video processing blocks run at 100 MHz while the DDR2 runs at 150 Mhz, 64-bit.So I have to use clock-crossing bridges to connect to the memory. Are these bridges causing the bottleneck or is it something else ?

I am stuck at this for quite sometime now, I am simply not able to tune the pipeline for the required performance.

Would really appreciate any light on the issue.

Thanks

Foram

Altera_Forum · ‎10-28-2008

Hi,

Sorry I am stuck too, and I am not available now to give relevant advice. Maybe Jake or someone will help.

It is said that one may lose 5 to 7 clocks when using clock-crossing bridges. If you have time, you make check it by using the Signal Tap.

Bests,

avtx30

Altera_Forum · ‎10-28-2008

Foram,

Which video processing blocks are you using? If you are using Altera's Deinterlacer or Frame buffer then you can force them to use a different clock domain for the memory masters than for the video processing. This would allow you to connect the memory interfaces directly to the DDR2 memory controller without using the clock crossing bridges.

If you want to do this, edit the following files:

C:\altera\80\ip\deinterlacer\lib\vip_dil_hwfast.hpp

Change line 10 from

# define DIL_MEM_MASTERS_USE_SEPARATE_CLOCK false

to

# define DIL_MEM_MASTERS_USE_SEPARATE_CLOCK true

C:\altera\80\ip\frame_buffer\lib\vip_vfb_hwfast.hpp

Change line 13 from

# define VFB_MEM_MASTERS_USE_SEPARATE_CLOCK false

to

# define VFB_MEM_MASTERS_USE_SEPARATE_CLOCK true

Then when you reopen SoPC builder you will see that the memory masters have their own clock domain and you will have to connect things accordingly.

Jake

Altera_Forum · ‎10-29-2008

Hi Jake,

Thanks for the info :). Yes I am using Altera's Deinterlacer and Framebuffer IPs.

Hope this would simplify things a bit. I will try this today and update the result.

Thanks

Foram

Altera_Forum · ‎10-29-2008

In these multi-master configuration, Monitoring Wait request by all masters is one way to implement scheduling. The Avalon Master arbitration logic takes of the rest when ever a master tries to read/write into the SDRAM controller. Every Master has to monitor the Waitrequest before performing a write/read to the SDRAM controller.

--Sheshi

Altera_Forum · ‎10-31-2008

Hi,

I tried the separate clock domain for deinterlacer and framebuffer however somehow couldn't get it to work.

However I needed to get it work ASAP :(... so i modified the design and got rid of the extra master; for now i get live almost live frame rate. A couple of frames are still being dropped but the frame rate is acceptable.

once this deadline is over, i hope to investigate these suggestions and try and implement it. I wonder how these this board and the IPs are being used to process HD resolutions. Is it merely an arbitration issue ?

Thanks

Foram