Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++
12747 Discussions

Nios behaving erratically?

Altera_Forum
Honored Contributor II
1,704 Views

I was wondering if anyone has encountered this strange problem... I have a scenario where I have a working code build for my particular product. however, when I add some simple code, it breaks other parts of code that are completely unrelated. The code I'm adding is NOT the problem because I can add some dummy printfs() and the same thing happens. To resolve it, I add dummy lines of code (re-assigning a variable for example), and after a few reassignments, it fixes the problem. 

 

This is very weird, and extremely hard to debug... it sounds like shifting around the code in certain memory locations breaks it. I am using SRAM to run code, so it is relatively low speed. Any ideas?
0 Kudos
10 Replies
Altera_Forum
Honored Contributor II
953 Views

Yes, I am working on a project that has the problem you describe. I have some "magic" printf's which make some code "work". Luckily, this code is part of a feasibility test and is not required for the product and have abandoned that part of the project. However, this problem is extremely annoying and frustrating. My project is running completely out of on-chip RAM. 

 

I have had similar problems with when making minor changes in the VHDL code and re-fitting or when updating my QII software. The second problem suggests a timing problem within the part. My project uses on-chip RAM for shallow LPM FIFOs to pass data across clock boundaries. 

 

Perhaps we can find some commonality in our problems we can get Altera to look at the problem. 

 

My project: 

Cyclone I : EP1C20F400C7 

Total LE: 11,468/20,060 (57%) 

Total pins: 228/301 (76%) 

Total Memory Bits: 209,920/294,912 (71%) 

 

40MHz system clock 

 

I suggest if anyone finds a cause and/or solution to please post it to the forum. 

 

Thanks. 

Steve
0 Kudos
Altera_Forum
Honored Contributor II
953 Views

That's also what I found; when I recompile the FPGA project making relatively minor and unrelated changes, running the exact same .elf file from memory works with the old image, but not the new one. 

 

It's ridiculous how each time I have to insert 'dummy' variable assignments just to get the code working. I hate using hacks like this... but I have no clue how to even begin troubleshooting this problem.
0 Kudos
Altera_Forum
Honored Contributor II
953 Views

Here are some things to check: 

 

1) Do you have a data cache? If so are you bypassing it when you talk to peripherals/share data with other processors in your system 

 

2) Does Nios II talk to your FIFOs? If you communicate to it with a pointer do you declare the pointer volatile so that the compiler doesn't optimize it away (if the compiler sees you reading from the same location over and over it sometimes optimizes the other reads away and visa versa for writes). 

 

3) Are you low on memory? Could be a heap + stack collision (if adding significant amounts of code fixes the problem then this is unlikely). These are hard to debug but you can try minimizing the code or filling a few locations with known data and seeing if it's getting clobbered by connecting to the system with the debugger. 

 

4) Are you recompiling hardware? If so are you absolutely sure you have all your pins assigned? If not then Quartus II will move pins around to improve timing but this could cause functional problems on your board. 

 

5) If you haven't already done so have your enabled -O0 software optimization? 

 

6) If you use DCFIFO to go across clock domains have you setup the synchronized clock setting properly? (if your answer is "huh?" go with unsynchronized clocks so that an extra register is inserted) 

 

7) Try stepping through code. Wherever in the code the CPU goes off into the weeds will help isolate the problem in most cases. 

 

I put these in the order of the most likely culprits. This is not a complete list but it's a good starting place. Also I really doubt everyone is running into the same problem because there are many things that can cause issues like you have described.
0 Kudos
Altera_Forum
Honored Contributor II
953 Views

BadOmen, 

Thanks for your input. To answer your questions: 

>1) Do you have a data cache? 

no. I use the small footprint CPU which has no cache. In the past, I have tried changing to the other two cached processors and have had "working" code stop working. By "working" I mean with previously mentioned superfluous printfs and variable assignments. No amount of printf's or variable assignment shell-games could get them working to the level of the non-cache version. 

 

>2) Does Nios II talk to your FIFOs?  

I used the IP on the projects section of the forum. Nice interface. I have hardware/state machines and the NiosII talking to various FIFOs. 

 

Optimization is for size or speed. I also use the 'small C library" and "reduced device drivers" in the system library. I cannot get the code to fit w/o these setting.  

 

>If you communicate to it with a pointer do you declare the pointer volatile so that  

>the compiler doesn't optimize it away (if the compiler sees you reading from the 

>same location over and over it sometimes optimizes the other reads away and visa  

>versa for writes). 

 

No pointers. Instead of pointers, I use simple loops and integer indexes in loops to fill arrays. IIRC, reading the manuals said that "volatile" only affects cached processors, which I am not. 

 

>3) Are you low on memory? 

somewhat. 2400 bytes free for stack + heap. I'll try turning on the "run time stack checking" syslib option to check on collision. hmmm, just rebuilt the project and with stack checking enabled I now show 6660 bytes free. odd. I have tried optimizing for size and various speeds, each with their own unique "behavior". 

 

>4) Are you recompiling hardware? If so are you absolutely sure you have all your  

>pins assigned?  

yes and yes. compared .pin files of past builds and they are identical, i.e., no signal names moving on pins. 

 

>5) If you haven't already done so have your enabled -O0 software optimization? 

can't. 

 

>6) If you use DCFIFO to go across clock domains have you setup the synchronized 

>clock setting properly?  

Yes. That's why I'm using the FIFOs. I've double-checked and the LPMs are definitely set for non synchronized clocks. I've also try using LE's instead of RAM implementation to improve memory usage and speed, as well. 

 

>7) Try stepping through code. Wherever in the code the CPU goes off into the  

>weeds will help isolate the problem in most cases. 

 

When I revisit this software problem, I will set breakpoints and single step around the problematic code. Good idea! thanks. 

 

>I put these in the order of the most likely culprits. This is not a complete list but it's 

>a good starting place. Also I really doubt everyone is running into the same  

>problem because there are many things that can cause issues like you have  

>described. 

 

I agree that everyone is not seeing this problem. I've taken many Altera quartus and Nios classes and have never seen this problem in the canned examples, haven't talked to anyone having these issues, and have seen many people on this forum using Nios on realworld projects. However, it seems at least two of us are. Perhaps we're making the same mistakes. Fine. Just trying to find them. I REALLY appreciate your taking the time to think about the problem and making suggestions, all of which are good suggestions based on good engineering practices. 

 

On a final note, I believe I'm chasing two separate problems. One is affected by the printf's, etc. and is probably NiosII/memory related. The second problem is a state machine/FIFO communication problem and is probably timing related. On a whim, yesterday I recompiled the hardware turning off the Tsu, Tco, and Tpd constraints, i.e., only clocks set in timing analyzer. The reported delays are marginally slower than when constrained; however, now the misbehaving FIFO/state machine is fine. Need to revisit the memory problem. 

 

thanks. 

steve
0 Kudos
Altera_Forum
Honored Contributor II
953 Views

 

--- Quote Start ---  

originally posted by badomen@Mar 27 2007, 07:00 PM 

here are some things to check: 

 

1)  do you have a data cache?  if so are you bypassing it when you talk to peripherals/share data with other processors in your system 

 

2)  does nios ii talk to your fifos?  if you communicate to it with a pointer do you declare the pointer volatile so that the compiler doesn't optimize it away (if the compiler sees you reading from the same location over and over it sometimes optimizes the other reads away and visa versa for writes). 

 

3)  are you low on memory?  could be a heap + stack collision (if adding significant amounts of code fixes the problem then this is unlikely).  these are hard to debug but you can try minimizing the code or filling a few locations with known data and seeing if it's getting clobbered by connecting to the system with the debugger. 

 

4)  are you recompiling hardware?  if so are you absolutely sure you have all your pins assigned?  if not then quartus ii will move pins around to improve timing but this could cause functional problems on your board. 

 

5)  if you haven't already done so have your enabled -o0 software optimization? 

 

6)  if you use dcfifo to go across clock domains have you setup the synchronized clock setting properly?  (if your answer is "huh?" go with unsynchronized clocks so that an extra register is inserted) 

 

7)  try stepping through code.  wherever in the code the cpu goes off into the weeds will help isolate the problem in most cases. 

 

i put these in the order of the most likely culprits.  this is not a complete list but it's a good starting place.  also i really doubt everyone is running into the same problem because there are many things that can cause issues like you have described. 

<div align='right'><{post_snapback}> (index.php?act=findpost&pid=22573) 

--- quote end ---  

 

--- Quote End ---  

 

 

Definitely some good suggestions. I don&#39;t have clock-domain crossing, so that shouldn&#39;t be an issue for me. Also, I have plenty of SRAM available, so no chance of heap/stack collision. I am using alt_uncached_mallocs(), and I&#39;m not sure how proven they are. I think the best suggestion for my case is turning off optimization; I haven&#39;t tried this yet, but I recall from past experiences with other processors that optimization always introduces headaches! 

 

thanks again.
0 Kudos
Altera_Forum
Honored Contributor II
953 Views

"No pointers. Instead of pointers, I use simple loops and integer indexes in loops to fill arrays. IIRC, reading the manuals said that "volatile" only affects cached processors, which I am not." 

 

Actually that&#39;s from Nios classic. Back then volatile meant no optimization and cache bypassing. With Nios II volatile simply means no optimization (that&#39;s the defined behavior from the C standard). To talk to your FIFOs do something like this: 

# include "io.h" // for IORD and IOWR# include "system.h" // so that you have the FIFO base addresses# include "stdio.h" // printf 

 

int main(void) 

int i; 

 

for(i = 0; i < 1024; i++) 

IOWR(MY_WRITE_FIFO_BASE_ADDRESS, 0, i); // stuff 0->1023 into the FIFO 

printf("Data back is %d\n", IORD(MY_READ_FIFO_BASE_ADDRESS, 0); // read back what you stuffed into the FIFO 

 

return 1; 

 

Assuming I didn&#39;t miss anything and that compiles http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/smile.gif that should give you cache bypassing reads and write to a FIFO component that has the write and read ports mapped to the system interconnect fabric.
0 Kudos
Altera_Forum
Honored Contributor II
953 Views

Thanks BadOmen. 

 

You are correct about the volatile functionality. My recollection was wrong as I hadn&#39;t looked at that topic in over a year. I guess my memory is going http://forum.niosforum.com/work2/style_emoticons/<#EMO_DIR#>/laugh.gif  

As I said, the closest I&#39;ve gotten to "working" software has been with the non-cached processor. I&#39;m presently trying to cut out code to test a non-optimized version of code. 

 

Your loop looks very similar to what I&#39;m doing. My proof of concept involved a Cypress-based USB interface for our product. I got as far as doing a simple loopback test and stalled out there due to my mysterious printf/variable problem. Below is the simple section of code. It reads until the USB chip&#39;s status shows empty. IIRC (long time since working on this problem), removing the no_activity assignment or incrementing through the data array rather than using only data[0] would break the loopback. Furthermore, I believe that if I separated the loop into two loops (read and write) using the variable j to keep track of the number points read and making that the terminal count for the writeback, the code would no longer work unless I put a printf statement in there somewhere. I could never get the two loops running. My hope was to eventually get a DMA transfer running so I could test throughput.  

 

Sorry for my vague recollections. Like I said, I abandoned the USB interface feature to finish other system features upon which I encountered the FIFO/state machine problem. 

 

Here&#39;s my loop: 

 

void USB_data_loopback(void) 

volatile alt_u8 i,j; 

alt_u16 data[128]; 

alt_u16 len; 

 

no_activity = 1;// clear flag 

if (got_out_data) //The FLAGS int tells us we have out data 

got_out_data = 0; 

if (IORD_ALTERA_AVALON_PIO_DATA(USB_STATUS_PIO_BASE)&0x02) 

 

j = 0; 

while(IORD_ALTERA_AVALON_PIO_DATA(USB_STATUS_PIO_BASE)&0x02) 

data[0] = IORD(USB_SX2_BASE, 0x00); 

// j++; 

IOWR(USB_SX2_BASE, 0x02,(alt_u16)data[0]); 

 

 

I am resuming my quest for a functional USB interface today. While it is not essential for our product, I need to understand this behavior as it may reappear in the essential code. I have learned a lot about the Nios tools over the past year and plan to use them, including the debugger and breakpoints. Hopefully i will find more clues. 

 

thanks. 

steve
0 Kudos
Altera_Forum
Honored Contributor II
953 Views

It&#39;s not plain as day to me what could be the problem except for that "got_out_data". I don&#39;t see anything but an assignment of 0 to it. 

 

Also this line: 

 

data[0] = IORD(USB_SX2_BASE, 0x00); 

 

Since it&#39;s a hard coded access you may want to declare "data" as volatile. The compiler may see that and say "why should I keep stuffing data into "data[0]" it should have it from the first time". 

 

If this is indeed the problem I bet this will work (not pretty but the compiler can&#39;t break it): 

 

IOWR(USB_SX2_BASE, 0x02,IORD(USB_SX2_BASE, 0x00));
0 Kudos
Altera_Forum
Honored Contributor II
953 Views

Thanks Bad Omen. The code snippet shown actually works. If I change the index for data from data[0] to data[j] and uncomment the j++ line, i.e., filling an array, the code stops working. 

 

The got_out_data is a flag set somewhere else either the main or an interrupt. 

 

I don&#39;t want to waste your time or anyone else&#39;s by beating a dead horse. I appreciate the input from you and at least the confirmation that there&#39;s nothing glaringly wrong with the bit of code. Perhaps the problem lies somewhere else... 

 

I&#39;ve managed to get the debugger/breakpoints running, sw breakpoint (http://forum.niosforum.com/forum/index.php?showtopic=6677), and can hopefully do some detailed debugging rather than changing code and seeing what happens. If I find the cause, I&#39;ll gladly post it here. 

 

Thanks. 

steve
0 Kudos
Altera_Forum
Honored Contributor II
953 Views

Oops I misread your other post. I&#39;m glad you have something working. If I manage to find my dumb USB controller and host software I&#39;ll post it one of these days. I was communicating with a FTDI chip. My controller doesn&#39;t have FIFOs and I just had Nios polling in a loopback mode. I think I was able to send a 1MB document back and forth in around 7 seconds so if I add some effort 1-3MB/s should be easy.

0 Kudos
Reply