Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
21615 Discussions

DDR3 Memory errors but passes Calibration

Altera_Forum
Honored Contributor II
2,674 Views

Hi We are getting intermittent memory read errors from our DDR3 SDRAM Controller with UniPHY hmc in a Cyclone V 

Using the EMIF toolkit we can run calibration on the memory and it passes every time. The margins look very good in the reports. 

The errors start after a reset/calibration cycle. 

 

We have written a test application that allows us to test the memory and activate a memory controller reset at will. 

Normally the test will run for days without any errors. We then activate a reset and re-run the test. Most times after the reset the test will be fine. Occasionally after the reset we get the fault condition. 

In the fault condition the test will start to give data read errors after a few minutes. Once in the fault condition the system remains in the fault condition until we re-run the reset/calibration, after which it may recover and continue to run without error for days. 

 

I have checked all the PSU voltages and they look good 

We are now using Quartus II 14.1.1 build 190. The problem was worse when we used 14.0 

We copied the schematic design from the Cyclone VGX development board. 

We are using exactly the same memory devices as on the development board, two 16 bit DDR3 devices. 

I notice that there are changes to the termination of nRESET and CKE on the latest Cyclone VGT development board. We do not have these changes. Could this be significant? 

We have tried building the memory interface using parameters from the Micron data sheet and parameters from the reference designs. This has not made any difference to the symptoms. 

We have tried building the memory interface using default board parameters and parameters taken from an analysis of the board layout. This has not made any difference to the symptoms. 

I have analysed the board layout and it is not perfect w.r.t. track lengths in that the DDR3 CLK tracks are 100ps shorter than they should be for the rest of the signals. 

When we use the EMIF tools to look at the margin reports and delay, clock phase, settings after each calibration they are different each time. 

Can our problem be that the calibration is not working properly due to some other issue in the system? 

 

I stress the system always passes Calibration with large margins. 

Please can you help? 

Thanks 

Dave
0 Kudos
6 Replies
Altera_Forum
Honored Contributor II
1,708 Views

I'm not a DDR memory expert by a long shot, but what is your clock rate? Have you examined the integrity of the clock with a scope? Can you reduce the clock rate and see if the issue persists?

0 Kudos
Altera_Forum
Honored Contributor II
1,708 Views

 

--- Quote Start ---  

I'm not a DDR memory expert by a long shot, but what is your clock rate? Have you examined the integrity of the clock with a scope? Can you reduce the clock rate and see if the issue persists? 

--- Quote End ---  

 

Hi thanks for reply.  

clock frequency is 300 MHz. Yes I will look with a scope but since the problem is only present after particular resets/calibrations I'd not thought of it as a possible cause of the problem. I will also try slowing down the clock as you suggest.  

Thanks 

Dave
0 Kudos
Altera_Forum
Honored Contributor II
1,708 Views

 

--- Quote Start ---  

Hi thanks for reply.  

clock frequency is 300 MHz. Yes I will look with a scope but since the problem is only present after particular resets/calibrations I'd not thought of it as a possible cause of the problem. I will also try slowing down the clock as you suggest.  

Thanks 

Dave 

--- Quote End ---  

 

 

Hi I've slowed the clock down to 250MHz with no improvement. :(
0 Kudos
Altera_Forum
Honored Contributor II
1,708 Views

I have a similar problem, using the same tools, IP and 300MHz clock, on a Cyclone V Altera development board. Intermittent read (or possibly write in my case) errors from the DDR3. So, a platform that clearly works without errors - as confirmed with the sample memory test projects supplied. 

 

I've not got to the bottom of it yet (early days) but clearly my implementation is responsible. 

 

Have you tried your hardware with the example design that's offered to you when you run the IP configuration tool? I could get that working. 

 

I will post any of my findings here. It's not a high priority here so that may be some time... sorry. 

 

Cheers, 

Alex
0 Kudos
Altera_Forum
Honored Contributor II
1,708 Views

you know.... these errors after calibration really grabs my attention... today i was messing with the UniPHY; and watch what i have observed: 

i have written some value 5 in DDR address 0. then value 6 in DDR address 1. after some issues i restarted the controller; by glbal reset. and when i accessed same DDR addresses i'v downloaded something like 145121623 from address 0 and still 6 from address 1. and then it hit me.... whenever you reset that controller by global reset; it starts recalibration process. which is, (if i understand it correctly) it must send random values into the DDR into random addresses, and then downloads them back (similar to echo) and tries to determine timing margins. so exactly that phenomenon overwrited my value 5 at address 0. that is why documentation strongly recommends to reset the system by soft reset. not the hard one. since the recalibration process will overwrite some addresses in the DDR. ...well, that might be the reason...... or maybe these erroneous words managed to sneak at these addresses by some other way....? what do you think???
0 Kudos
Altera_Forum
Honored Contributor II
1,708 Views

I have a solution for my problem:) 

My problem was power supply ripple (noise) I had over 100mV of ripple on the 1V5 power supply to the DDR3 memory. The output of the voltage regulator I had used was unstable and required extra capacitance on the output to stabilize it. 

I had been running tests with the EMIF toolkit to see variation in the margins reported by the toolkit after calibration. With the ripple the reported margins were varying by over 10% each time I ran the callibration 

Once I had removed the ripple the reported margins varied by less than 0.1%. 

I suggest that anyone who suspects problems with calibration does the following 

1 Use the EMIF toolkit to run calibrations and monitor the margins. 

2 Check the hf and lf noise on the power supply voltages for the system. 

Thanks
0 Kudos
Reply