uncontrolled SDRAM content change

Altera_Forum · ‎07-27-2012

Hi,

I use a code which checks external SDRAM () by writing 0xAAAA and 0x5555 patterns. But the code always return an error code!

When I check step by step SDRAM content with NIOSII IDE I can see (with the memory monitor) that data cell changes well at desired address but a lot of other cells change at same time around the desired address. The other cells have arbitrary value but all change with the same value.

I don't understand why because I just do a simple loop with an IOWR and an offset increment of 1 (for 32bit cell increment)

- All the code resides inside the onchip memory (.text + .data + heap + stack) so nothing should access to SDRAM

- reset and exception vectors point to onchip memory

- PLL is well configured for SDRAM

- No compilation error or warning

I'm sure the hardware is OK because when running with the "normal" firmware and .sof (in this case code resides inside SDRAM) and everything is OK.

I use an EP3C40F484I7 cyclone III FPGA with a Micron MT48LC4M32B2 SDRAM, Quartus and NIOS II 9.0SP2

Altera_Forum · ‎07-30-2012

the random values could come from data cache flushes. As you are writing with IOWR, if for some reason those addresses were cached, when flushing the cache the old values are written back to memory.

Did you try using pointers instead of IOWR? That way you will access the memory through the data cache.

Altera_Forum · ‎07-30-2012

If this is actually a cache problem, you should overcome it using uncached addresses : sdram_address + 0x80000000

Altera_Forum · ‎07-30-2012

I don't think it is a cache issue because I only access SDRAM with the IOWR and IORD commands. The code is very simple and never access to SDRAM by another way.

And even no access to SDRAM, a lot of memory cells content are changed at every step (using F5 step with debug) and that have no sense for me!

I will try using pointers

Altera_Forum · ‎07-30-2012

I tried with read and write pointer access but it changes nothing:

During the sequential write loop, each cell is correctly written, but some steps after the same cell is written again (why???) with a random dummy val!

here the original simple code:

for (offset = 0; offset < nWords; offset++) {

IOWR(baseAddress, offset, pattern1);

}

Altera_Forum · ‎07-30-2012

Are you sure the parameters of the sdram controller match your MT48LC4M32 device? I mean CAS latency, refresh period, tRCD, tRAS and so on.

Is your Quartus project fully time constrained and does it meet timing?

What's the Nios/sdram clock frequency?

Altera_Forum · ‎07-30-2012

Don't just add 0x8000_0000 to a pointer to bypass the cache, use the cache remapping macros since there are flushes involved as well. They are documented in the software developer's handbook. You might not see a behavior change since the flush is only necessary for some corner cases but you should get into a habit of using the APIs and not flipping bit 31 of your pointer locations.

Altera_Forum · ‎07-31-2012

Yes I'm sure SDRAM settings are correct and constraint are also given for Quartus and no error/warning is present after compilation.

External frequency is 50MHz and internal (PLL outputs) are 80MHz

But now I remember I use the NiosII/e CPU. So it can't be a cache issue because this version of CPU doesn't manage cache at all!

I continue to investigate.

If somebody has any other idea... you are welcome :confused:

Altera_Forum · ‎07-31-2012

I don't know how the memory monitor works, but it could go through the data cache, and therefore use it and trigger cache flushes/loads. Maybe someone knows how it is done.

Anyway it is always possible to define the cache size as 0 in the Nios parameters to disable the cache and see if the problems are still there.

The only two other explanations I can find are indeed a problem in the SDRAM access parameters (especially the refresh rate, but in my experience you need several seconds to loose content when you messed up the DRAM refresh) or that something else, like a DMA, or an interrupt routine from the software, is overwriting your memory buffer. It could be a good idea to put some signaltap probes around the SDRAM controller to see if anything is happening.

Altera_Forum · ‎07-31-2012

Have you tried the memtest or the mini_memtest software examples that come on the ACDS? Those are coded to look for specific problems like stuck/shorted/open data or address bits. Being able to run code out of a memory doesn't necessarily mean it's working, there are all kinds of corner cases that can go unnoticed so I would test that memory with code *and* a DMA to make sure is it behaving properly.

Altera_Forum · ‎08-03-2012

With signal tap I can see that SDRAM controller is running. I plugged scope probes and I can see that data, address and control signals voltages change at every SDRAM access.

My code is already based on memtest from Altera. I did some change to let test run until end of SDRAM even a step with error.

Here is one test:

for (pattern1 = 1, offset = 0; offset < nWords; pattern1++, offset++)
  {
    IOWR(baseAddress, offset, pattern1);
  }
for (pattern1 = 1, offset = 0; offset < nWords; pattern1++, offset++)
  {
    if (IORD(baseAddress, offset) != pattern1) {
      retCode = (baseAddress + offset);
      errCounter++;
    }
  }

The errCounter tell me that all steps get an error! I can't believe the entire SDRAM is down!

Altera_Forum · ‎08-03-2012

This may sound stupid but are you sure you are using the correct base address? Be careful if you have a bridge that shifts the address range.

With SignalTap you can also check the signals on the Avalon side of the controller and see exactly what is done, and if it corresponds to what you do in the software or if there are other accesses.

Altera_Forum · ‎08-05-2012

I recommend capturing the data in signaltap at the slave port of the SDRAM. Put a trigger on the base address of the test being written as well as a break condition between two loops in your last post. When you run the code you should see whatever value 'pattern1' is written to the slave port. Now that the CPU is stopped on the break point, prepare signaltap to trigger on the base address of the test being read, once that's ready hit the advance button in the debugger which should let that second loop run to completion. Verify whatever data 'pattern1' is that it was read out correctly since this can help you figure out of you are looking at a hard or software issue.

Note: To trigger both of those I would setup the trigger for the address to be whatever the base address of your test is (remember this is a word offset if you signaltap it at the SDRAM slave port) and ((rising edge write) or (rising edge read)). Capturing on either rising edge should ensure the same trigger will work for both loops (writes and reads).

Simulations couldn't hurt either, it's just that if you have a hardware issue like pins not constrained properly or mis-wired simulations will not find those.

Altera_Forum · ‎08-08-2012

OK I did a check with signal tap, but a little bit different that how you told me.

here is the test code I used:


void test (void)
{
  alt_u32 offset;
  alt_u32 nWords = SDRAM_SPAN>>2;
  alt_u32 pattern1 = 0xAAAAAAAA;
  alt_u32 pattern2 = 0x55555555;
  alt_u32 errCounter = 0;
  alt_u32 firstErrorAddress = 0;
  alt_u32 lastErrorAddress = 0;
  alt_u32 lastError = 0;
  
  // Test 1
  // write 0xAAAAAAAA patern to all SDRAM cells
  for (offset = 0; offset < nWords != 0; offset++) {
    IOWR(SDRAM_BASE, offset, pattern1);
  }
  // read back 0xAAAAAAAA patern to all SDRAM cells
  for (offset = 0; offset < nWords; offset++) {
     if (IORD(SDRAM_BASE, offset) == pattern2) {
        errCounter++;
     }
  }
  
  errCounter = 0;
  // Test 2
  // write 0x55555555 patern to all SDRAM cells
  for (offset = 0; offset < nWords != 0; offset++) {
    IOWR(SDRAM_BASE, offset, pattern2);
  }
  // read back 0x55555555 patern to all SDRAM cells
  for (offset = 0; offset < nWords; offset++) {
     if (IORD(SDRAM_BASE, offset) == pattern1) {
        lastError = offset;
        if (errCounter == 0) {
          firstErrorAddress = offset;
        }
        else {
          
        }
        errCounter++;
     }
     else {
       if ((offset - lastError) == 1) {
          lastErrorAddress = offset - 1;
        }
        else {
          
        }
     }
  }
}

At debug startup I check the SDRAM content (with the memory monitor) to see about what kind of values are present. And I can see about everything (that normal because no data has been written yet)

step 1:

I placed a breakpoint at end of first loop (writing 0xAAAAAAAA into entire SDRAM) and set signal tap trigger to SDRAM data bus with 0x55555555 pattern trigger. I ran to code until break point and nothing triggered in signal tap (it's normal but I can be sure that the 0x55555555 pattern as never been sent to SDRAM)

step 2:

I placed a breakpoint at end of second loop (reading 0xAAAAAAAA from entire SDRAM) and still set signal tap trigger to SDRAM data bus with 0x55555555 pattern trigger. I ran to code until break point and nothing triggered in signal tap(still normal but I can be sure that no 0x55555555 pattern is present inside SDRAM)

step 3:

I placed a breakpoint at end of third loop (writing 0x55555555 into entire SDRAM) and set signal tap trigger to SDRAM data bus with 0xAAAAAAAA pattern trigger. I ran to code until break point and nothing triggered in signal tap (it's normal but I can be sure that the 0xAAAAAAAA pattern as never been sent to SDRAM)

step 4:

I placed a breakpoint at end of fourth loop (reading 0x55555555 from entire SDRAM) and still set signal tap trigger to SDRAM data bus with 0xAAAAAAAA pattern trigger. I ran to code until break point and here a lot of trigger occured in signal tap (1256917 times)! So a lot of cells content (or address lines) seem corrupted or defect! It is about 30% of entire SDRAM, between offset 0x0000000E and 0x003FFFF4 (in 32bit step address mode) So it is about the entire range, not specially around a region.

I think it is an hardware issue but it is little bit strange because I worked a lot of hours with this hardware and with the main firmware running in SDRAM without any error, crash or strange functionality!

Have you got an idea if I can do another test?

Altera_Forum · ‎08-08-2012

Test# 1 looks wrong, if you write pattern1 to the memory then you should be validating if pattern1 is read back correctly and not pattern2 (I wouldn't expect a memory problem to cause pattern1 to magically become switched to pattern2). I was suggesting triggering on the writes to memory and not the data because after all it's the data you are seeing a problem with so you should be determining if the writes/reads are even reaching the memory.

Given your triggers I would assume the memory is really corrupt early in the test. Now if you trigger off the conditions I suggested at least you'll be able to verify whether the corruption occurs between the processor and SDRAM controller, or it the problem is in the SDRAM controller/interface/device. I suspect it'll be the latter which could suggest wiring/timing problem. I also recommend performing standard tests for things like stuck address/data lines, the memtest example software includes such tests. For example if you were looking for stuck data lines you would write walking ones/zeros to a fixed address to ensure only a single 1/0 walks across the word.

Altera_Forum · ‎03-24-2015

Was this issue resolved? I am having similar problem, but with a different SDRAM and a MCU.