Hello,We are using an Arria II with DDR2 chips on two sides of the FPGA. Is there a way to aggregate the two and control them with one instance of the DDR2 HP controller? We are using HP and not HP II. The two chips do not share any signals. Thank you!
The most straight forward approach (I think at least) would be to create a bridge that has one wide slave port and two master ports of half the width. The bridge would be responsible for buffering wide requests and sending them to both masters to handle accessing each memory controller. For read data the bridge would be responsible for making sure both data halves return before gluing them together into a single wide word.For all this to work I made the following assumptions: 1) You don't care about latency 2) You use SOPC Builder or Qsys to do this There are probably cleaver ways of instantiating two PHYs and one controller and rolling your own solution but that seems like a lot of monkey work to me. Also do you really need a wide interface? Could you get away with accessing each side independently?
Thanks BadOmen for the response!I kind of came to the same conclusion, which is basically RAIDing the two chips using two HP instances and adding a wrapper to make it transparent. I need to use the two chips at the same time not for the bus width but for throughput. I am not using SOPS/Qsys but have my own wrapper around the Altera phy. As far as latency, I wonder how that would be affected if I instantiate two controllers and send the commands at the same time to both instances with each instance handling half the data. I wonder if the refresh cycles will be sync'd as the lack of that may cause extra latency. Do you agree on my assessment?
With two independent memory controllers you will not be able to guarantee that when one throttles the other one will as well (for example when a refresh cycle occurs). So because each controller calibrates differently they are not going to be perfectly in sync with each other and as a result the local_ready signal from each controller with not behave the same at all times between the two controllers.The solution to this problem is to buffer write data and read/write commands in a FIFO and dispatch the buffered information to each controller independently. Likewise you would need two independent read data FIFOs to catch the data returning (when both read buffers are not empty then you have a single wide word of data glued back together). These additional FIFOs are what I was eluding to when I mentioned the latency increase. One thing you might run into is timing issues because your bridge will be wedged between two controllers on opposite sides of the chip. Depending on the clock frequency you are targetting you might need some extra pipelining. The FIFOs I mentioned might help since the input and output will be registered by the on-chip memory they instantiate. If that's not enough in SOPC Builder you would add pipeline bridges and Qsys pipeline bridges or automatic pipelining. How you do that without using those tools will require some work on your part.
The wrapper I use has command and data fifos already. The wrapper also allows multiple readers/writers to access the DDR in a slice fashion. So, I think I will follow your advice and add another fifo layer to stitch the two data halves together. Each read/write command uses a big enough data burst length that should compensate for any refresh hiccups. As far as timing, I use a dcfifo to cross between the ddr domain and the interface domain and it seems to work ok for the DDR frequency I am using. Using two fifos to stitch the data means that I will have 3 clock domains to deal with, but I think that will not be a problem.And as far as the latency due to the extra fifos, that's ok because I queue up multiple commands and once the data stream starts, the throughput is adequate. Thanks again for your insight.
If you already have FIFOs I don't think you'll need an extra buffer layer. In the case of writes if only one controller is ready you can still post the write, just don't pop the FIFO until you can post the other half of the write (or you can dual buffer write commands and write the data independently...... assuming you are not worried the two FIFOs getting out of sync). In the case of the reads if you instantiated two half width FIFO's (one per controller) then you just pop the FIFO when both are not empty. This way you won't end up adding more round trip latency by double buffering the commands and data.
--- Quote Start --- We are using an Arria II with DDR2 chips on two sides of the FPGA. Is there a way to aggregate the two and control them with one instance of the DDR2 HP controller? We are using HP and not HP II. The two chips do not share any signals. --- Quote End --- Can't you just aggregate the DDR2 chips (of both sides) into a single (wide) Altmem_PHY. Having a copy of the address and command bus for each side should be handleable by the Quartus II SW. (if it can't: file an SR!?) See external memory interface handbook volume 2: section i. device and pin planning (http://www.altera.com/literature/hb/external-memory/emi_plan_pin.pdf): page 1-4
josyb,My problem is that I do not know how to setup the HP controller wizard to generate independent signals for two chips. It looks like I can generate independent signals for clocks and some other signals, but not for the address signals. I know how configure the wizard for two chips that share the same address lines. When reading page 1-4, it seems to say that the controller can handle chips that are located on opposite sides of the device, but it does not indicate whether the chips can be completely independent (signal-wise) of each other. You still think it should be possible?
The idea is to connect the output signals (command, address) of the HP controller to the two physical pins each. E.g. the output ddr2_a of the controller has to go to both the RamA_A and RamB_A pins. So you need to create an intermediate signal (or wire) to connect this. You will have to delve into the .sdc file and modify / copy the appropriate constraints to cover both sets. I would then expect the Quartus II fitter to apply fast register IO settings etc. to both pin sets.
With this approach, the commands to the DDR have to complete at the same time for both chips. I don't know if the DDR command completion time is cycle-based or depends on other parameters / conditions.Do you know if I can guarantee that the same command sent to the both chips at the same time will complete at the same clock cycle?
As the commands for the two chips are are issued by one controller each DRAM chip will run in sync with the other. The Altmem-PHY just sees the two chips as an aggregated DQ/DQS bus.
That was the last unknown variable that I had. I will try this approach. Hopefully the timing constraints will not be too difficult to add.Thank you for all the help!