FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6343 Discussions

DDR2 read performance issue

Altera_Forum
Honored Contributor II
1,004 Views

I'm having trouble getting maximum performance of DDR2 read. According to other threads of this forum, I should get near max performance when doing sequential reads. However, I'm getting only around 25-50% performance. I don't know much about DDR2. I think I'm missing something, but I cannot figure out what. 

Could someone please help me on this? 

 

I am using the DE4 board from Terasic, which has Stratix IV GX EP4SGX530. I've been able to read from and write to the DDR2 SDRAM memory that runs at 400MHz. However, due to other component requirements, I'm currently running the DDR2 memory clock at 266MHz. I used Nios to perform a full memory test and it completed without failure, so at least I know the memory controller is working. Some modules I'm using need near full memory bandwidth (>85%). As shown in the signaltap capture stp_read.bmp, I am getting poor read performance, mainly due to a lot of waitrequest signals. To solve this issue, I decided to debug this in simulation. 

 

The first thing that I tried was adding bursts (the previous signaltap image does not include bursts). However, it did not help much. I tried maximum burst size 8 and burst size 32 when generating DDR2 controller. The attached images and the following log shows the simulation results (using burst size 32). 

# [105076258] [DWR=000]: Reading data 00000ffd00000ffc @ 7000fe (BRC=7/0/f8 ) burst 6 

# [105078133] [DWR=000]: Reading data 00000fff00000ffe @ 7000ff (BRC=7/0/f8 ) burst 7 

# 512 write operations using burst during 535 AFI_CLK cycles, utilization of 95.70% 

# 512 read operations without burst during 1982 AFI_CLK cycles, utilization of 25.83% 

# 512 read operations using burst during 1006 AFI_CLK cycles, utilization of 50.89% 

 

According to the simulation, (using the DDR2 controller and memory model generated by Quartus), I get near full write performance (>95% memory bandwidth utilization), but read performance is poor. The attached files write0.bmp and write1.bmp shows the write timing diagram at the beginning of a sequence of write operations and the middle of the write sequence. The attached files read0.bmp and read1.bmp shows the read timing diagram of a sequence of read bursts. read0_1.bmp is a magnified version to show more detailed timing of bursttransfer and waitrequest signals (address shown here is byte address, it actually increments by 0x20 in word address). 

 

At this point, I don't know what to try next. I don't know much about DDR2 (or SDRAM in general), so it's difficult for me to find out which direction I should explore to fix this problem. I can try random tweaks in the memory controller, but it seems I may be missing something simple due to my lack of knowledge. Any guidance or tips is greatly appreciated. :) 

 

I'm using: 

Stratix IV GX EP4SGX530 

DDR2 1GB SO-DIMM (the one included in DE4 kit) 

DDR2 Controller with UniPHY (configuration is attached as text file) 

Quartus II 11.0 (64-bit) 

ModelSim SE 10.0a (64-bit) 

 

P.S. Since the number of files I can attach is limited to 5, I tried to include a link that shows all images at once for your convenience. However, the forum blocks including links if the post count is less than 5. So I only attached the most relevant images. Everything else can be found in the zip file. Sorry for the inconvenience.
0 Kudos
3 Replies
Altera_Forum
Honored Contributor II
248 Views

Here's an update of my findings after a round of messing with DDR2 configurations. 

(Of all the configurations I tried, the following two were the only ones that affected performance). 

 

First, though this may be obvious to DDR2 experts and experienced HW designers, reducing Mode Register 0 Burst Length from 8 to 4 will half the performance (50% read BW utilization becomes 25%, and 95% write BW utilization becomes 50%). So I set it back to 8. 

 

Second, I disabled reordering (i.e. unchecked "Enable Reordering" in the Controller Settings). This gave me near max read performance when using burst. For non-burst read operations, I get 50% utilization. 

# [110483758] [DWR=000]: Reading data 00001ffd00001ffc @ c003fe (BRC=3/0/3f8 ) burst 6 

# [110485633] [DWR=000]: Reading data 00001fff00001ffe @ c003ff (BRC=3/0/3f8 ) burst 7 

# 1024 write operations using burst during 1043 AFI_CLK cycles, utilization of 98.17% 

# 1024 read operations without burst during 2132 AFI_CLK cycles, utilization of 48. 3% 

# 1024 read operations using burst during 1067 AFI_CLK cycles, utilization of 95.97% 

 

This was done in simulation, so I will have to try it on the real hardware. 

 

Is this normal that read performance drops when using enabling reordering? 

I couldn't find anything about performance drops when enabling reordering in the "DDR2 and DDR3 SDRAM Controllers with UniPHY User Guide". Having reordering data feature would be nice, since my hardware does not always do sequential reads/writes. The user guide also explains that reordering data would allow maximum efficiency, so I was wondering if I was doing anything wrong to get poor sequential read performance.
0 Kudos
Altera_Forum
Honored Contributor II
248 Views

That's correct, the re-ordering is what caused it. It is being addressed so for now disable it.

0 Kudos
Altera_Forum
Honored Contributor II
248 Views

SOPC builder no longer generates a testbench for the Uniphy DDR2 controller. For the Altmemphy DDR2 controller, it used to generate a module called <instance_name>_test_component, which read in a .dat file and gave the data to the SOPC system. It's not a simple testbench which just gives the data according to the address, but there's multiple pipeline stages with support for bursts.  

What testbench did you use for your simulation??
0 Kudos
Reply