Behavior of the safe state machine option

Altera_Forum · ‎12-06-2007

In a design, I have to make sure, that illegal states caused by a malfunction of the fpga (possibel outer influence) is detected.

Can i use the safe state machine (though) ?

I would like to add stability if possible but am not sure, how "gracefully" the fsm can "recover from an illegal state.".

How is this achived? What happens in between with the outputs of the fsm, when an illegal state occurs ?

Altera_Forum · ‎12-06-2007

The safe state-machine detects for states that aren't specifically defined, and send the SM back to reset(I beleieve). So if you have a binary encoded state-machine with 4 bits, and and your SM has 13 states, then it looks for those 3 extra states. What your outputs do is unkown(what would be considered safe?). If you need to control those, then you need to create your state-machines so there are no unencoded states, and define the outputs for each one.

I've twice done a fuller write-up on this topic, and then deleted it because it was probably more than you were asking for. But if you want to give more information on why you think your SM will go to an illegal state, and what type of result will it have, I might be able to give more info.

Altera_Forum · ‎12-06-2007

I am interested in any amount of info :)

What I basically need to know is: Does the safety option will send back the FSM into the state, where it came from ? - Obviously not

What I basically need to do is :

Detect any misbehaviour of the FSM since the design is fully coded :cool: - and there is only one chance of a wrong state : physical errors - possibly caused by EMI, Radiation or whatever. (Unlike with most designs, this is very likely with my current design :mad: )

Therefore, i am doing both: detectin a misbehaviour and creating a stable design. Typically (and in the past) I do (did) this in creating protection FSMs, which control the main FSM and act as a watch dog. Only, when save transitions haven taken place and have been confirmed, that state is used to change the dependant signals and registers.

I now thought, that there is a ready to use protection mechanism to include into designs.

But i think this is more something for a megacore ...

The FSM-system of my current approach acts on half of the system speed and has a latency of 6 clks after valid transisions are reported. This must be optimized .:o

Altera_Forum · ‎12-06-2007

- I think you've got the most important issue covered, is that many users want safe state-machines when their device isn't susceptible. You recognize you're in an environemnt where this can happen

- Safe state-machines are not always safe by just returning to a known state. You already get this. But the example where the SM is just a sequencer, and it goes from s3 -> unkown -> s0. The inputs might be at a point to tell it to go to s4, and will sit at those values while the SM won't advance because it's in S0. I think what you're doing is the best, whereby you're purposefully looking at the error and deciding what to do. This is also the most difficult.

- Polling is often the most robust, whereby a SM is doing in triplicate, and the outputs look at all three states and compares them. If they ever don't match, it takes the output that hopefully two of them agree on, and the third SM is reset or somehow set to get back in line with the other two. I've seen FPGAs put down on a board this way(three in parallel). Naturally, this is extremely costly and often not feasible, but it's really one of the more full-proof, easy to understand, methods to handle these errors. (Plus, the outputs should never go to an unprepared state, i.e. they get their data from the two circuites that are working while the third one is fixed.

- I don't know your system, but how much of a failure it can handle plays a lot into this. For example, lots of systems require the error to be recognized, but can handle being off-line for a little bit. For these, the user just puts in a lot of flags and checks for error conditions all over the place(state-machine going to states it shouldn't, counters at count values they shouldn't, data that looks corrupted like CRC failures, etc. These flags can be software interrupts or something. Of course, if you're controlling the ejector seat on a jet, that might not be feasible.)

- I hope there's good reading material on this subject, but I've never gone down that path. The bottom line is that you're trading area/cost and performance to gain reliability under your conditions. It really requires looking at as much control logic in your system and trying to figure out how it would deal with random changes, and coding in how it would recover. This usually isn't simulatable(not because you can't flip a SM to another state, but because you can't do every permutation at every time in your sim).

- One last thing is you might want to use a family that has the internal configuration bit checker:

http://www.altera.com/literature/wp/wp-01012.pdf

Altera_Forum · ‎12-07-2007

And the safe state machine option for Quartus II synthesis is documented in Altera docs: http://www.altera.com/literature/hb/qts/qts_qii51008.pdf. Other EDA tools gives the same behavior for this type of option I believe.

Safe State Machines

The Safe State Machine option and corresponding syn_encoding

attribute value safe specify that the software should insert extra logic to

detect an illegal state and force the state machine’s transition to the reset

state.

...

It is important to note that the safe state machine value does not use any

user-defined default logic from your HDL code that corresponds to

unreachable states. Verilog HDL and VHDL allow you to explicitly

specify a behavior for all states in the state machine, including

unreachable states. However, synthesis tools detect if state machine logic

is unreachable and minimize or remove the logic. Any flag signals or logic

used in the design to indicate such an illegal state are also removed. If the

state machine is implemented as safe, the recovery logic forces its

transition from an illegal state to the reset state.

...

Safe state machine implementation can result in a noticeable area increase

for the design...