FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6355 Discussions

Onchip RAM Corrupted by Reset

Altera_Forum
Honored Contributor II
1,455 Views

Hi, 

 

I currently have a large system that I have been developing for the last year or so. Until now I have been uploading a sof file to the FPGA, and then using eclipse to upload Nios code to an on-chip instruction RAM. This has all been working fine. 

It has now got to the point where I have embedded the Nios code in the sof file and converted it to a flash file for my Stratix V DSP Development board (5SGSMD5K2F40C2 FPGA). The trouble is, I am now having problems with the processors on-chip instruction RAM. 

 

Essentially the system consists of two subsystems, a PCIe Application layer using the Av-ST Altera Hard IP for PCIe, and a User system which contains DDR3 RAM, a Nios Processor, some On-Chip RAM for data an instructions (one for each), and then a whole lot of other stuff. The DDR memory supplies the clock for the Nios processor and user system, and the PCIe core provides the clock for that subsystem. 

 

When I cold boot the PC, the FPGA configures from the flash and DDR initialises. The PCIe core provides the soft reset signal to the DDR memory controller (UniPHY in Qsys). In this scenario, the user system comes out of reset once the PCIe core is ready. 

This is all working fine. PC boots up and Nios processor starts blinking an LED (there is another LED that blinks when the DDR controller is out of reset). Everything is good. 

 

If I then restart the computer, the PCIe core is reset by the PC which in turn soft resets the DDR controller and hence the User system. As soon as this happens, the Nios processor stops blinking its LED. I know that everything is out of reset as the LED showing a the system is out of reset resumes blinking. 

 

If I use System Console from eclipse and download the contents of the instruction RAM (on chip memory), I can see that the first few bytes of it, which also happens to be the reset vector, and a few bytes in other places near the end of the RAM, have all been corrupted. It's no wonder the processor doesnt boot. 

 

I saw on the Altera support website that there is a knowledge base article on this type of thing, saying asynchronous resets can corrupt M20K contents, but that article related to Quartus 13.1, and said it would be fixed in 13.1sp1. But I am using Quartus 14.0. I also made sure to select the option in Qsys to enable the reset_request option on the on-chip RAM. 

 

 

Is there any way to avoid this corruption? I don't want to set the instruction RAM to be a ROM as I will still need to be able to reprogram it from eclipse without recompiling.
0 Kudos
8 Replies
Altera_Forum
Honored Contributor II
449 Views

When you reset a DDR memory controller it stops refreshing the memory. Thus the memory content is lost. If you re-initialize the memory controller quickly it is possible that not all memory cells are lost. Thus your observed memory corruption. If you wish to preserve DDR content over a soft reset, you need to configure your system so that the hardware does not reset the memory controller on a soft reset and the software does not re-initialize it when doing a soft boot. 

 

Depending on how your board is wired it may not be possible to prevent the FPGA being re-configured when the PC reboots. If this happens, you will also loose DDR.
0 Kudos
Altera_Forum
Honored Contributor II
449 Views

It's not the DDR I care about - that is not connected to Nios, I'm only mentioning it as it supplies the clock to the Nios Processor. 

 

It's the on-chip memory from which Nios is running that is getting corrupted.
0 Kudos
Altera_Forum
Honored Contributor II
449 Views

Is the Onchip RAM get shared with other masters as well? Maybe PCIe?  

 

If you perform Cold boot, can it work again? 

 

The end of the RAM is usually where the Nios stack is, therefore, you should see some kind of random data if both the instruction and data masters are sharing the same RAM. 

However, I did notice that you stated you have 2 RAM each for data and instruction. Can you check the linker and see where does the stack goes? You can check this by running the nios2-bsp-editor. The tool has some kind of preference to choose larger memory as its data memory even though you have separate data memory. 

 

RAM corruption is very low probability (unless you are always lucky). Does the issue reproducible every time the computer restarts?
0 Kudos
Altera_Forum
Honored Contributor II
449 Views

The RAM was definitely getting corrupted - I did a dump and the contents had changed. The issue was completely repeatable. Any restart of the PC caused the RAM to become corrupt which was solved only by reconfiguring the FPGA. 

 

The linker script is set up correction. The data memory is being used for the stack and variables, the instruction memory used for the instruction region. And each RAM is only connected to the Nios (both are single port RAMs not shared with anything else). 

 

--- 

 

I've since moved to using the Flash and the Nios Boot Copier, and the problem has gone away. Basically every time the Nios processor gets reset, it reads the instruction memory contents from the Flash to the On-Chip RAM and boots.
0 Kudos
Altera_Forum
Honored Contributor II
449 Views

The RAM corruption changes the reset vector content (just to re-confirm)?  

Are you using Nios II/f core? Maybe you can enable the ECC feature. 

Does the PCIE sends interrupt to Nios? 

 

You stated that the DDR memory supplies clock (afi_clk) to the Nios and user system (including the RAM), do you also use the afi_reset supplied? Just to ensure that afi_clk is stable and locked and that Nios and the user system can run properly. From your first post, the soft reset from PCIE is connected to the DDR controller and user system and that the user system will be up and running once the PCIE core is ready but your DDR memory is the one supplying the clock to user system?  

 

One thing that can you can do is to remove the DDR3 controller, leaving just the PCIe and Nios/RAM, and perform the restart PC sequence. We could rule out anything that is related to DDR3. We could use crystal oscillator clock (on-board clock) to run Nios/RAM. 

 

I also assume that your design is meeting timing.  

 

I have heard of RAM corruption with asynchronous reset but the probability of occurring is like 1/500 resets. But you are getting it every time you restart, which is a 100% chance hit rate (You must have been in or out of luck).
0 Kudos
Altera_Forum
Honored Contributor II
449 Views

Can you add a signaltap instance with a powerup trigger trigging on the write signal of your RAM module? This could tell you if the corruption is a problem in the RAM module itself or if the CPU is doing something weird at one point and overwrites the RAM. The good thing about powerup triggers is that the sampled data is held in Signaltap's buffer until you read it, so you can restart the PC and then use Signaltap later to recover the data.

0 Kudos
Altera_Forum
Honored Contributor II
449 Views

Reset vector contents got changed - about the first 10 addresses at the bottom of the memory, and a similar number at the top of the memory (the exact number of lines varies). It's also not unique to one board, we have a system of 8 cards and they all exhibited similar behaviour. 

 

I am using the fast Nios core (non ECC though). No interrupts are sent from PCIe to Nios - they communicate via a letterbox scheme using a completely separate dual-port memory. 

 

The DDR controller supplies the clock as well as the reset to the user system. The PCIe core is on a separate clock domain. So the user system reset (including the Nios) is driven only from the afi_reset pin. The DDR controller reset (soft_reset) is driven by the PCIe core - so this way the PLL is stable (not reset by global_reset) by the time the PCIe core has been enumerated and Windows has loaded (the soft_reset isn't released until the driver writes to a register in the PCIe BAR). 

 

The design fully met timing - the worst case Fmax was ~240MHz and the user system clock was running at only 200MHz. No violations are reported at any corner. 

 

 

But at this point the firmware has grown quite a bit in the last 6 months and the boot copier from flash is taking care of the problem - any corrupted contents is simply overwritten when the Nios processor boots which seems a more sensible approach. Even if corruption was only happening 1/500 times, that's still quite a high chance something could go wrong. 

 

I may later on have a look with signaltap later on, but at the moment I am focused on other areas of the project.
0 Kudos
Altera_Forum
Honored Contributor II
449 Views

Noted TCWORLD. 

 

Good luck in debugging. SignalTap would be able to give you more visibility on what is happening during the PCIE soft reset.
0 Kudos
Reply