State machine transitioning into illegal state

Altera_Forum · ‎11-24-2008

I am working with a simple state machine with 4 states. The state machine is properly detected by Altera's state machine optimizer, and the state diagram it produces is correct.

When I probe this state machine with Signaltap, I find that it's entering an illegal state (all 4 state signals are low). Signaltap image is attached, triggered when all states == 0. This is an intermittent problem that does not occur every time.

This state machine does have asynchronous inputs (transmit_req), but they are all synchronized with code similar to that below.

always @(posedge Clk, negedge Rst_)
	if (~Rst_)
		safe_transmit_req <= 2'b0;
	else begin
		safe_transmit_req	<= transmit_req;
		safe_transmit_req	<= safe_transmit_req;
	end

The transition that is failing is conditional on ~safe_transmit_req[1]. It leaves the state, but instead of going to the correct state (st_fabric_Idle), it enters some unknown state.

st_fabric_Ack: begin
	fabric_transmit_ack	= 1'b1; 
	if (~safe_fabric_transmit_req)	begin	
		sm_tx_fabric_next	= st_fabric_Idle;
	end

The resets aren't shown but, even if it was getting reset, it should transition to the idle state. Does anyone see any glaring errors that would cause it to enter an illegal state?

Altera_Forum · ‎11-24-2008

Do you have timing violations for recovery or removal? TimeQuest runs those analyses by default. You have to request them with the Classic Timing Analyzer.

Altera_Forum · ‎11-24-2008

Thanks for the reply; I'm using the Classic timing analyzer, so I'll rerun it and post the results.

Altera_Forum · ‎11-24-2008

The smallest slack time in the Recovery sections is 0.466 ns, the smallest in Removal is 1.292 ns - positive, so passing.

Altera_Forum · ‎11-24-2008

What is the timing relationship between the FSM clock and async inputs?

Is it possible that one of the async inputs is a pulse shorter than 1 FSM clock period?

If so then it may not be correctly registered by your retime logic

Just a thought.

Altera_Forum · ‎11-25-2008

I assume the state machine uses the same clock that synchronizes the asynchronous inputs, so recovery and removal analysis should have caught any timing problem for these signals if they are asynchronous to the state machine (as opposed to just asynchronous to Clk coming into the FPGA). And I wonder now if I got the wrong idea and those "asynchronous inputs" are actually synchronous to the state machine. If they're not in the always block sensitivity list for the state machine registers, then they should be covered by setup and hold analysis rather than recovery and removal analysis.

Maybe someone will have another idea.

Altera_Forum · ‎11-25-2008

The "asynchronous" inputs are actually from another state machine inside the FPGA running on a different clock domain. The clocks are nominally the same frequency, but because they come from external sources this is not guaranteed. I wanted to avoid the exact situation you described, so I put in a request/ack system between the two. The initiating FSM raises a Req signal and waits for the receiving FSM to raise the Ack signal. When the initiating FSM sees the Ack, it lowers the Req signal. When the receiving system sees the Req go low, it lowers the Ack. When the initiating system sees the Ack go low, it continues on and is allowed to initiate a new Req. I also double registered the signals between the two as described above to prevent metastability, but this might be overkill. The delay caused by these registers causes the Req/Ack signals to be held longer than necessary.


Req   __------______
Ack   ______------__

In the Signaltap I posted previously, you can see the high request line, the raising of the Ack, the request subsequently falling (gated through safe_fabric_transmit_req[1:0]), and the receiving FSM transitioning based on the request line going low (this is where it fails the state change). The Ack goes low as it should, but that is probably just because that signal is based on being in the Ack state, and we have moved into some bad state.

I'll look into how I generate this Req signal in the other FSM, and see if I'm doing something that might cause glitching or some other funny business.

Altera_Forum · ‎11-25-2008

--- Quote Start ---

I assume the state machine uses the same clock that synchronizes the asynchronous inputs, so recovery and removal analysis should have caught any timing problem for these signals if they are asynchronous to the state machine (as opposed to just asynchronous to Clk coming into the FPGA).

--- Quote End ---

The safe_fabric_transmit_req[1:0] inputs are registered with the same clock as the FSM reading these signals.

--- Quote Start ---

If they're not in the always block sensitivity list for the state machine registers, then they should be covered by setup and hold analysis rather than recovery and removal analysis.

--- Quote End ---

The registered signals (I believe this makes them synchronous now) are in fact in the sensitivity list (the state machine transition and output logic is in an always @ * block)

I am not clear what purpose the recovery and removal analysis plays in this - I'll read up and maybe it will give me some ideas.

Altera_Forum · ‎11-25-2008

I was talking about the sensitivity list for the registers, not the one for the combinational logic in the always @ * block. The sensitivity list for the registers should have only the clock and signals that drive the "asynchronous" inputs to the physical registers (clear, load, etc.--what's available I think varies by device family). If you use one of the register's asynchronous inputs, the timing on that input is covered by recovery and removal, which are similar to setup and hold respectively for the truly synchronous inputs like D and clock enable.

I would expect the only asynchronous input to the state machine registers to be a reset that is itself synchronized to the clock. That is what recovery and removal analysis is typically used for. Positive slack ensures that all registers creating the state code see the reset deassert in the same clock cycle, so they all leave the reset state in the same clock cycle. If more than one state bit can toggle at the exit from the reset state, recovery and removal analysis is critical to make sure the correct combination of bits toggles.

Altera_Forum · ‎11-25-2008

I see - you are correct, the sensitivity list for all state machines and registers is

always @(posedge Clk, negedge Rst_)

Do you know what would be required to prevent glitches in the state machine outputs? Would forcing the state machine into 1-hot or Gray encoding guarantee this?

Altera_Forum · ‎11-25-2008

Register the state machine outputs if you can. You might need to assert them one state early to compensate for the additional latency of the output registers.

If the state machine outputs are combinational, they will not glitch if their logic has at most one input toggling at a time. If the state code bits are the only inputs to the logic for state machine outputs, then Gray encoding will avoid glitches. 1-hot will have 2 bits toggling at each state change, which can cause a glitch on any LUT fed by both of those bits.

Altera_Forum · ‎11-25-2008

I made this a Moore FSM, and the output is only active during a single state - however thinking about it I realized that it doesn't matter if these are glitching, as they are already registered at the input of the receiving FSM module.

Altera_Forum · ‎11-25-2008

I didn't read all the posts, but one-hot state-machines will have an all 0s state when looked at in SignalTap or in a timing simulation. The reason is that the registers power up to all 0s. In your state-diagram, the idle state is shown as being all 0s. In essence, it's just like a normal one-hot except the lsb is inverted:

0000

0011

0101

1001

Synthesis still decodes off a single bit, but for the idle state we just decode if the LSB is a 0 instead of if it's a 1.

Also, if bringing asynchronous signals into a SM, I would recommend double-registering them in the new domain to shake out metastability.

Altera_Forum · ‎11-26-2008

hi,wcalkins (http://www.alteraforum.com/forum/member.php?u=7636)

I got the same problem as yours. My state matchine jumps to an unwanted state between times. But my SM was not recognized by quartus II software.

It seems I solved my problem. One of my state machine inputs that from an input pin was not synchronized.

Altera_Forum · ‎11-27-2008

Problem fixed.

It turns out that not registering/synchronizing signals across clock domains was indeed the problem, and in more than one location. In one FSM I was only single registering the input, and in another I was reading the count from the wrong side of a DCFIFO.

Lesson learned: Be very vigorous about following proper clock domain crossing boundaries and follow Altera's recommendations as laid out in Application Note 473.