Different results on hardware compared to simulator

Altera_Forum · ‎08-01-2012

I'm having trouble when I run my code on the actual Cyclone III FPGA. In the simulator it works fine. I have set up state machines to make sure that clock signals to registers are occurring after plenty of data setup time has passed. I'm running the chip with just a 14 MHz clock so its not very fast.

In ModelSim everything works perfectly. My code is modular so I am building up the final code one module at a time. I can put in several modules, test, and everything works as expected. But at some point when I add in another module everything starts to go wrong. Modules that worked earlier no longer work.

I added in some ports on the modules to bring out the data to some external LEDS so I can see what the signals are doing. Often that causes the code to just start working again properly. The results are very inconsistent as I make changes to the code.

What am I missing here that makes the code work incorrectly on the actual hardware?

Altera_Forum · ‎08-01-2012

--- Quote Start ---

What am I missing here that makes the code work incorrectly on the actual hardware?

--- Quote End ---

Missing synchronization logic between clock domains? Missing reset synchronizer logic. Timing constraints?

Does Quartus warn you about anything? Make sure all synthesis warning messages are resolved.

Does TimeQuest indicate all timing is met?

Rather than using LEDs to probe your design, use SignalTap II - this logic analyzer allows you to see much more of the internals of the design.

Cheers,

Dave

Altera_Forum · ‎08-01-2012

Its a single clock domain. I am using a reset synchronizer design that was recommended by Altera (reset does not appear to be the problem). As far as warnings, I have warnings about objects that are assigned but never read (features not yet implemented).

I have some "inferring latches for variable ... which holds its previous value in one or more paths through the always construct".

I have some bits that I'm not using yet which have no driver (using default initial value of 0)

I have some Tri-state nodes which do not directly drive top level pins. In my internal modules there are some bi-directional busses, which is why there are tri-states that don't drive the outside world.

Clock multiplexers found and protected.

All the warnings seem non-consequential to me.

I have not done timing. Since the chip is running at such a slow clock rate (14 MHz, about 70 ns period) and I am setting up data and then waiting at least one clock cycle before I clock it into the next register I thought there would be no problem with that. Could it be that I need to provide more setup time? Delay it another clock cycle?

At this point SignalTap would be difficult because I don't have access to the JTAG port without hacking the board (a flaw which I will correct on the board in the future).

Altera_Forum · ‎08-01-2012

--- Quote Start ---

I have not done timing. Since the chip is running at such a slow clock rate (14 MHz, about 70 ns period)

--- Quote End ---

Try including a minimal .sdc file, eg., here's one I've copied and then edited to reflect your clock, edit the port name for the clock, and then edit the names of the reset and outputs that you do not care about (with regards to meeting clock-to-output delays):

#  -----------------------------------------------------------------#  Clock#  -----------------------------------------------------------------# #  14MHz clock (70ns period)
set clk_period 70
#  External clock (internal logic clock)
set clk clkin_14MHz
create_clock -period $clk_period -name $clk 
#  -----------------------------------------------------------------#  Cut timing paths#  -----------------------------------------------------------------# #  The timing for the I/Os in this design is arbitrary, so cut all#  paths to the I/Os, even the ones that are used in the design,#  i.e., reset and the LEDs.# 
#  External asynchronous reset
set_false_path -from  -to *
#  LED output path
set_false_path -from * -to

The latch warning is also a worry. Fix your logic so that registers are used, not latches.

--- Quote Start ---

At this point SignalTap would be difficult because I don't have access to the JTAG port without hacking the board (a flaw which I will correct on the board in the future).

--- Quote End ---

Lesson learned then, eh :)

Cheers,

Dave

Altera_Forum · ‎08-01-2012

I did try setting up timing yesterday and used the wizard to generate an sdc file. It looks much like what you sent me, but with a couple of different set_false_path statements that I don't fully understand:


set_false_path -from  -to 
set_false_path -from  -to

TimeQuest does not indicate any failures.

I will work on the latches.

Also I see there is a new version 12.0 SP1 for Quartus. I'm going to download and try that.

Altera_Forum · ‎08-02-2012

I was able to eliminate the latch warnings. Code running on the hardware is still working illogically (or rather not working).

Is this warning a real concern:

Warning (13004): Presettable and clearable registers converted to equivalent circuits with latches. Registers power-up to an undefined state, and DEVCLRn places the registers in an undefined state.

I eliminated this warning on a couple of registers at a lower level. Should I pursue this more?

I also get a bunch of warnings in the timing analysis of this type:

Warning (332060): Node: uart:rs485_port|baud_clk[0] was determined to be a clock but was found without an associated clock assignment.

It seems to occur for every signal that is used as an edge trigger to a always_ff block, like this:

always_ff @ (posedge reset_n, posedge baud_clk[0])

begin

blah,....

end

In that particular example the baud_clk is a 4 bit counter that is just being clocked by the main system clock (after it goes through another counter I use for deriving the desired baud rate * 16).

Is there a quick and easy way to set up the timing analysis so that it checks things properly?

Altera_Forum · ‎08-02-2012

--- Quote Start ---

I was able to eliminate the latch warnings. Code running on the hardware is still working illogically (or rather not working).

Is this warning a real concern:

Warning (13004): Presettable and clearable registers converted to equivalent circuits with latches. Registers power-up to an undefined state, and DEVCLRn places the registers in an undefined state.

I eliminated this warning on a couple of registers at a lower level. Should I pursue this more?

--- Quote End ---

I've never seen this one, so I suspect your code is just not particular synthesizeable. Create a minimal code example that generates this error, and post it, or just post the section of code you're getting the error with, and members of the forum can review it.

--- Quote Start ---

I also get a bunch of warnings in the timing analysis of this type:

Warning (332060): Node: uart:rs485_port|baud_clk[0] was determined to be a clock but was found without an associated clock assignment.

It seems to occur for every signal that is used as an edge trigger to a always_ff block, like this:

always_ff @ (posedge reset_n, posedge baud_clk[0])

begin

blah,....

end

--- Quote End ---

Right, all 'clocks' need clock constraints. What you want for this one, is the create_generated_clock constraint. Alternatively, you could code your UART to operate at the FPGA clock frequency, and use the baud-rate divider as an enable to that logic. The code ultimately works the same, however, there's one less clock ... but the logic has to operate at a faster clock rate ... Nothing comes for free :)

Older generation FPGAs did not have PLLs and could not route logic element outputs back into the clock network, so your only coding option was to use enable pulses. In your case, creating a slow clock for the UART is fine, you just need to use create_generated_clock to tell TimeQuest the characteristics of that clock.

--- Quote Start ---

Is there a quick and easy way to set up the timing analysis so that it checks things properly?

--- Quote End ---

Not so much 'quick and easy', but you can set it up to properly analyze the timing. The way you are doing it is fine, i.e., iteratively ...

synthesize -> look at warnings -> add constraints -> repeat

Where you repeat until there are no warnings, or at least until you understand why the warnings are occurring and that they can be ignored, eg., signal assigned a value that is not read, and warnings like that.

Cheers,

Dave

Altera_Forum · ‎08-02-2012

This will be because the registers in the cyclone III have no asynchronous set and preset inputs, unlike earlier chips. So this behavious has to be emulated.

I suspect the problem is nothing to do with timing exactly, but poor design practice in the source code. If latches are being created then the problem will be timing issues that occur with such designs that cannot be looked at with time quest. Latches and asynchronous logic will also make the design unreliable and suseptable to temperature variations.

Altera_Forum · ‎08-02-2012

The set_false_path constraints look like they're for a DCFIFO.

Altera_Forum · ‎08-02-2012

--- Quote Start ---

I have some "inferring latches for variable ... which holds its previous value in one or more paths through the always construct".

--- Quote End ---

Don't use latches.

--- Quote Start ---

I have some Tri-state nodes which do not directly drive top level pins.

In my internal modules there are some bi-directional busses,

which is why there are tri-states that don't drive the outside world.

--- Quote End ---

There's no tri-state functionaltiy in the FPGA so why code as if there is and rely on the tools 'fixing' things for you?

Convert to multiple busses, one for each Tx point.

--- Quote Start ---

Clock multiplexers found and protected.

--- Quote End ---

If you're using a single clock domain why is it being multiplexed?

Use clock enables instead!

--- Quote Start ---

I am setting up data and then waiting at least one clock cycle before I

clock it into the next register

--- Quote End ---

That isn't the usual approach for synchronous design. If you're using the same clock to launch and capture signals and the tools know how fast the clock is things _should_ work. (You can have multi-cycle paths but that's the exception rather than rule). You're not doing board design, let the tools worry about timing while you worry about finctionality.

Can you get some input from an experienced FPGA designer? When you've seen how things should be done (no latches, proper synchronous process templates, clock enables etc) it's much easier to get things working.

As long as you're getting signals into and out of the FPGA OK your simulation results should match real world results.

I hope this helps a bit,

Nial.

Altera_Forum · ‎08-02-2012

I have eliminated the latches.

I'm a bit puzzled about your comment about not using tristate nodes inside the design. I am trying to implement bi-directional data busses. Of course at the ports of the device that is required to interface with a multiplexed address/data bus. But inside when communicating with the modules are you suggesting that it would be better to implement it as data_input bus and data_output bus? Keep them separate (which doubles the interface pins for the data between modules). I can do that if you believe that is a recommended better practice.

Clock multiplexers are created by the synthesis, I did not code them. Its probably because I have so many different registers being clocked by the main input clock, but I don't really know.

When I talked about setting up data and clocking it one cycle later I was not referring to coding it down at the lowest registers. I meant that I have data coming in that is enabled by an external write signal. The rising edge of that write signal should clock the data into an input register. Then internally I need to decode that data (it can be an address). So to give the decode logic time to operate I wait one clock cycle then clock it into the internal register where I need to put it. I assume that you cannot expect to clock data, decode it with comb logic and then clock it into a final register all at the same external clock edge. So thats why I created state machines to sequence the logic in a well defined manner. Am I wrong in this?

I wish I had an experienced FPGA designer around. We are a very small startup company and I do all the hardware engineering and just got my first FPGA boards back about 3 weeks ago. I have a design that needs to get shipped by the end of this weekend so I'm up against a real time pressure to figure this out. This is actually the second FPGA design. The first one I did a week ago and got it working but I cannot predict what happens when I make a small change in the logic. I know now that its because I did not do the timing and that affects the fitter results tremendously. So, here I am learning timing.

Altera_Forum · ‎08-02-2012

I have gone through the TimeQuest online course and I understand the basics. I am working through the main Quartus Interactive Tutorial on timing. But still have some puzzles.

My design uses a multiplexed address/data bus (data_io) with an address latch enable (ale), wr and rd signals. There is also a 14.7456 MHz clock coming into the FPGA which I use to run the internal processes.

A cycle of writing to the FPGA would start with placing an address on the data_io, then set the ale high, then low again. The high time is about 1us so its much slower than the clock. The data_io is valid about 1us before the rising edge of the ale. A write cycle is virtually the same but uses the wr signal going high for 1 us.

In TimeQuest I created the 14 MHz clock just fine. Now it thinks that ale and wr and rd are also clocks and I need to constrain them. Of course, they do not have a regular repeating cycle. There is only those 1us pulses occuring when we want to communicate with the FPGA. So how does one define that in terms of a clock? Also, I cannot control the relationship of the fast clock signal to the ale, wr, or rd signals. I created a "clock" that is the ale input port with a period of 2000ns (1us high time). So my clocks look like this:

create_clock -name {clk_14Mhz} -period 67.817 -waveform { 0.000 33.908 } [get_ports {clk}]

create_clock -name {ale} -period 2000.000 -waveform { 0.000 1000.000 } [get_ports {ale}]

Should I just lengthen the period of the ale and set the fall time to 1000 to simulate closer to what the ale would be doing?

Altera_Forum · ‎08-02-2012

The most important issue is that a signal is viewed as clock if it is used to edge trigger any register since it will be connected to clk port of flips.

The safest thing for a beginner is one clock for all registers.

As to tristating inside fpga: you don't need that as it is going to be implemented as muxes and will complicate coding on you.

Altera_Forum · ‎08-02-2012

--- Quote Start ---

The most important issue is that a signal is viewed as clock if it is used to edge trigger any register since it will be connected to clk port of flips.

The safest thing for a beginner is one clock for all registers.

As to tristating inside fpga: you don't need that as it is going to be implemented as muxes and will complicate coding on you.

--- Quote End ---

Basically I have used the main clock to clock all the registers. But then the data in registers is decoded and enables lower level logic.

So, how can I implement a bi-directional bus?

Altera_Forum · ‎08-02-2012

As I understood you have used wr, rd as clocks. It does not matter to the compiler if they are regular pulses or not. It will be clocks if used with clock edge assignment.

for bidirectional bus: you don't have to think of bidirectionality. just use two buses, one for each direction until such a point you interface them to actual bidir pins.

Altera_Forum · ‎08-02-2012

I understand that any signal used as the edge to clock a register is considered a clock. I am getting warnings that the launch and latch times between source clock: ale and destination clock: clk are outside legal time range. Ale is basically asynchronous with the main clk. How can I tell it that? I used create_clock not create_generated_clock since ale has nothing to do with the main clk. I guess that is considered another clock domain.

Bidirectional Busses: OK, Got it. Don't use bidirectional busses inside the FPGA to communicate between modules. Just use a data_in and a data_out bus.

I will rewrite my code to do this.

I am beginning to see improvement with even partial timing constraints set. So far the hardware is working correctly. I will see if it behaves properly as more design changes are made.

Altera_Forum · ‎08-02-2012

As I said before the safest method is one clock system with various rate control adjusted through clk enable. If you have to use more than one clk then you need to cross clk domains which is tricky for a beginner.

Altera_Forum · ‎08-02-2012

I'm trying to get rid of this warning:


Warning (13004): Presettable and clearable registers converted to equivalent circuits with latches. Registers power-up to an undefined state, and DEVCLRn places the registers in an undefined state.
	Warning (13310): Register "addr_register" is converted into an equivalent circuit using register "addr_register~_emulated" and latch "addr_register~latch"

All 8 bits in my addr_register have this warning. If I make the address register very simple like this it works and that warning goes away:


	always_ff @ (posedge ale_clk)
	begin
		addr_register <= data_in;
	end

However, what I want is to automatically increment the addr_register after every rd or wr. There are other conditions on the increment which I have combined into a single bit called addr_incr. So here is the code with the increment operation added in:


	always_ff @ (posedge ale_clk, negedge end_rd_or_wr)
	begin
		if (ale_clk) addr_register <= data_in;
		else addr_register <= addr_register + addr_incr;
	end

Now it creates that warning and converts this into a latch. This is happening in several areas of my code wherever I implement any kind of increment or decrement on a register.

Is this OK? If not, how do you increment or decrement a register?

Altera_Forum · ‎08-03-2012

The implementation for your design depends on the timing relationship between signals.

Even though ALE is called "Address Latch Enable", it does not have to be used as a latch enable, rather it can be used as a register enable.

The same goes for the read and write signals, if they are synchronous to the clock signal, you can use them combinatorially to create the address increment signal.

If the external bus is completely asynchronous to any clock you have access to, then the logic becomes quite different.

If you are using an FPGA that contains a PLL, then you can generally create an internal clock that can be used to oversample and synchronize external signals. If the processor interface has a wait-state control, then you can extend bus read/write cycles to meet timing.

So what bus are you trying to interface to?

Cheers,

Dave

Altera_Forum · ‎08-03-2012

You really want to keep everything driven off _one_ clock in the system. This means oversampling ale etc, detecting edges and doing the appropriate thing.

Normally I'd use one of the PLLs to generate a clock 4 or 5 (or more) * the rate of what I'm sampling, but if your data is stable for 1us and ALE is 1us wide then the 14MHz clock should be sufficient.

A simple example follows of how I'd do your address latching and incrementing in VHDL....

 
signal addr_in_d, addr_in_2d : std_logic_vector(7 downto 0);
signal ale_d, ale_2d, ale_3d : std_logic;
signal address : std_logic_vector(10 downto 0);
 
  :
 
process(clk,rst)
begin
if(rst = '1') then
  addr_in_d <= (others => '0');
  addr_in_2d <= (others => '0');
 
  ale_d <= '0';
  ale_2d <= '0';
  ale_3d <= '0';
 
elsif(rising_edge(clk)) then
 
  addr_in_d <= cpu_addr;
  addr_in_2d <= addr_in_d;
 
  ale_d <= ale;
  ale_2d <= ale_d;
  ale_3d <= ale_2d;
 
  if(ale_2d = '1' and ale_3d = '0') then  -- Detect the rising edge of Ale and..
    address <= addr_in_2d;                -- ..register in the address..
  elsif(inc_address = '1') then            -- ..or if inc address is active
    address <= address + 1;               -- increment it.
  end if;
 
end if;
end process;

It's good practice to register asynchronous signals in twice to guard against metastability. I then register ale in again to compare it with the first 'stable' version.

If the address is stable long before Ale goes active you probably don't have to register it in but it's a habit. In other applications you could adjust the number of times it's sampled depending on set up and hold times wrt Ale.

If your design is done correctly you should have exactly the same hardware behaviour from each re-build. Differing behaviour means the design is incorrect or something isn't constrained properly.

Again I hope this helps.

Nial.

Altera_Forum · ‎08-03-2012

--- Quote Start ---

The implementation for your design depends on the timing relationship between signals.

Even though ALE is called "Address Latch Enable", it does not have to be used as a latch enable, rather it can be used as a register enable.

The same goes for the read and write signals, if they are synchronous to the clock signal, you can use them combinatorially to create the address increment signal.

If the external bus is completely asynchronous to any clock you have access to, then the logic becomes quite different.

If you are using an FPGA that contains a PLL, then you can generally create an internal clock that can be used to oversample and synchronize external signals. If the processor interface has a wait-state control, then you can extend bus read/write cycles to meet timing.

So what bus are you trying to interface to?

Cheers,

Dave

--- Quote End ---

We are using a TS-7400 CPU board which basically has a 20 bit data IO bus that comes off of a CPLD on their board. They use a EP9302 processor chip which is running from the 14.7456 MHz crystal. However, I do not know about whats happening internal to that processor. The 9302 talks to their CPLD and manipulates the data IO bus. They also pass the 14 MHz clock signal through their CPLD (which ends up being the FPGA clk) and they can turn it off or on in the CPLD (which means there must be delays as it passes through their CPLD). We have decided to use 8 bits as a multiplexed addr/data bus and 3 other lines as ale, rd, and wr. Because of the number of IO lines we need from the FPGA its not practical to use more lines to interface to the TS7400.

So our code that runs on the TS7400 basically goes through this process:

Set the Data Direction Register (DDR) for DIO[7:0] to output.

Place the data on the data bus.

Set the ale (or wr) bit high.

Set the ale (or wr) bit low.

Read cycles would by like this:

Set the DDR for DIO[7:0] to input.

Set the rd bit high.

Read the DIO[7:0] data

Set the rd bit low.

By scope measurements I have determined that the rate that we can go through this process takes about 1us for each steo (depending on what instructions we need to execute). That means when writing (addr or data) the data will be on the DIO bus about 1 us before the rising edge of ale or wr and will be held there until about 1 us after the falling edge of ale or wr. We would read data in about 1 us after asserting rd. Seems like plenty of setup and hold time to me though the ale, wr, and rd are not necessarily synchronous with the clk signal.

The ale_clk in the code that I showed earlier is just a gated version of clk made like this:

assign ale_clk = (clk & ale);

In my inexperience I have been coding all register transfers using edge triggered blocks like this:


always_ff @ (posedge ale, negedge end_rd_or_wr)
begin
  if (ale) addr_reg <= data_in;
  else addr_reg <= addr_reg + addr_incr;
end

Would it be better in some cases to use logic levels, not egdes? And does that mean necessarily that its combinatorial logic rather that flip flops?