Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
21594 Discussions

Shift registers can be dangerous!

Altera_Forum
Honored Contributor II
3,351 Views

(note: re-posted from my blog.  

http://xiaoleicestustc.blogspot.sg/2013/10/shift-registers-can-be-dangerous.html  

The photo mentioned in my original post is in the attachment) 

 

During the last two days, I have been debugging an issue on FPGA, and found in the end that the root cause is a 4-bit shift register.  

 

See the photo below, which I drew to illustrate. It is a shift right register, and there are inverters at D input of bit 3 and bit 2. These four bits will be reset to "0000", and becomes, at consecutive rising clock edges, "1100", then "1010", then "1001", and back to "0000", and repeats the pattern. Looks like a normal and safe design, right? 

 

What went wrong on FPGA in my case was that, I observed that these four bits would be stuck at "1000" after my FPGA was running some FW code for about 20 mins. I still am not quite sure whether these four bits become "1000" due to having latched some glitches. But once they become "1000", they will stay there, for ever. You can do the logic yourself. 

 

 

As a result of this stuck, one important clock signal in my hardware design would be gone, which in turn would cause the FW to hang at a particular FOR loop. (Of course I am now in hindsight. In fact it took my colleague David and me almost one working day of debugging to trace from the hang FW back to this shift register). 

 

 

As I see from this incident, shift registers can be dangerous!
0 Kudos
13 Replies
Altera_Forum
Honored Contributor II
2,081 Views

 

--- Quote Start ---  

(note: re-posted from my blog.  

http://xiaoleicestustc.blogspot.sg/2013/10/shift-registers-can-be-dangerous.html  

The photo mentioned in my original post is in the attachment) 

 

During the last two days, I have been debugging an issue on FPGA, and found in the end that the root cause is a 4-bit shift register.  

 

See the photo below, which I drew to illustrate. It is a shift right register, and there are inverters at D input of bit 3 and bit 2. These four bits will be reset to "0000", and becomes, at consecutive rising clock edges, "1100", then "1010", then "1001", and back to "0000", and repeats the pattern. Looks like a normal and safe design, right? 

 

What went wrong on FPGA in my case was that, I observed that these four bits would be stuck at "1000" after my FPGA was running some FW code for about 20 mins. I still am not quite sure whether these four bits become "1000" due to having latched some glitches. But once they become "1000", they will stay there, for ever. You can do the logic yourself. 

 

 

As a result of this stuck, one important clock signal in my hardware design would be gone, which in turn would cause the FW to hang at a particular FOR loop. (Of course I am now in hindsight. In fact it took my colleague David and me almost one working day of debugging to trace from the hang FW back to this shift register). 

 

 

As I see from this incident, shift registers can be dangerous! 

--- Quote End ---  

 

 

Why your clock to 2nd and 3rd bit are not connected to clock signal? Otherwise I don't see why it will get stuck apart from bad timing.
0 Kudos
Altera_Forum
Honored Contributor II
2,082 Views

Instead of the unclear schematic you should post the code and tell how you tested it.

0 Kudos
Altera_Forum
Honored Contributor II
2,082 Views

Logic inside FPGAs do not have for loops. You need to post the code....

0 Kudos
Altera_Forum
Honored Contributor II
2,082 Views

Hi All, 

 

My original photo has some errors. Yes, clock ports of the 2nd and 3rd register should be connected to clk signal. sorry for the confusion. I have just updated my drawing. 

 

I was not asking you guys to repeat what I did. I am just sharing my experiences debugging this particular issue :) 

 

The FOR loop, I mean a for loop in my firmware. I didn't mean a for loop in my RTL code.
0 Kudos
Altera_Forum
Honored Contributor II
2,082 Views

You apparently managed to delete your original post instead of editing it. 

 

In my opinion, it would be better to post a new drawing and leave the old as is. Without it, the other contributions can't be understood at all.  

 

I also think that editing or even deleting old posts should be blocked after a certain amount of time.
0 Kudos
Altera_Forum
Honored Contributor II
2,082 Views

Here I attach my new drawing. 

 

And below is the VHDL code that got synthesized into the 4-bit shift-register. I used Synplify Pro for the logic synthesis, and Quartus for place & route. 

 

process (ADC_CLK, rst) begin if rst = '0' then cnt <= "1000"; elsif rising_edge(ADC_CLK) then cnt(2 downto 0) <= cnt(3 downto 1); cnt(3) <= cnt(0); end if; end process;
0 Kudos
Altera_Forum
Honored Contributor II
2,082 Views

The code is O.K. so far. You don't report a problem in the present post. 

 

It's not clear how you tested the design. 

 

P.S.: kaz fortunately copied your original post. I presume that the said "1000" refers to the register state at gate level, not RTL. It corresponds to state "0000" in RTL. 

 

There are two possible explanations how the initially seeded "1000" can be lost: 

- timing violation by releasig reset asynchronously 

- timing violation by clock glitches, too high clock frequency or too low pulse width 

 

If you are observing the design in real hardware and it's initially operating but failing sometimes later, clock timing violations are the likely explanation
0 Kudos
Altera_Forum
Honored Contributor II
2,082 Views

 

--- Quote Start ---  

You apparently managed to delete your original post instead of editing it. 

--- Quote End ---  

Actually no, it's the forum's new antispam system that is too aggressive. As the edited post included a link, it automatically moderated it down. The moderators aren't warned of this, so we only see it if we look carefully at the thread.
0 Kudos
Altera_Forum
Honored Contributor II
2,082 Views

 

--- Quote Start ---  

Here I attach my new drawing. 

 

And below is the VHDL code that got synthesized into the 4-bit shift-register. I used Synplify Pro for the logic synthesis, and Quartus for place & route. 

 

process (ADC_CLK, rst) begin if rst = '0' then cnt <= "1000"; elsif rising_edge(ADC_CLK) then cnt(2 downto 0) <= cnt(3 downto 1); cnt(3) <= cnt(0); end if; end process; 

--- Quote End ---  

 

 

Now your code does not have inverters and resets to 1000. Your first post have two inverters and resets to 0000
0 Kudos
Altera_Forum
Honored Contributor II
2,082 Views

See my replies/comments in red color inline. 

 

 

 

--- Quote Start ---  

The code is O.K. so far. You don't report a problem in the present post. 

 

It's not clear how you tested the design. 

 

P.S.: kaz fortunately copied your original post. I presume that the said "1000" refers to the register state at gate level, not RTL. It corresponds to state "0000" in RTL. 

 

[Xiaoleic: Exactly! ] 

 

There are two possible explanations how the initially seeded "1000" can be lost: 

- timing violation by releasig reset asynchronously [Xiaoleic: Very likely, this is the reason. rst in my code is an asynchronous reset signal. My colleague suggests that I should synchronize it, so that its release is synchronized with the clock. I have attached again a drawing, showing such a synchronizer.] 

- timing violation by clock glitches, too high clock frequency or too low pulse width 

 

If you are observing the design in real hardware and it's initially operating but failing sometimes later, clock timing violations are the likely explanation 

 

[Xiaoleic: You are right. I was observing this in real hardware. I ran some stress-test, in which I believe the asynchronous reset is repeatedly applied and released. The whole test would run about 2 hrs, and I usually observed this failure about 20 mins into the test. ] 

 

 

--- Quote End ---  

0 Kudos
Altera_Forum
Honored Contributor II
2,082 Views

Hi, Kaz, 

 

Yes, my RTL code resets to 1000, and does not have any inverters. But in the synthesized netlist, I see the two inverters.  

 

And I think in Stratix III FPGA I use, registers cannot be rest to '1'. So I believe all 4 registers reset to '0'. 

 

 

--- Quote Start ---  

Now your code does not have inverters and resets to 1000. Your first post have two inverters and resets to 0000 

--- Quote End ---  

0 Kudos
Altera_Forum
Honored Contributor II
2,082 Views

 

--- Quote Start ---  

Yes, my RTL code resets to 1000, and does not have any inverters. But in the synthesized netlist, I see the two inverters.  

 

And I think in Stratix III FPGA I use, registers cannot be rest to '1'. So I believe all 4 registers reset to '0'. 

--- Quote End ---  

 

Yes, the same with all newer Altera FPGAs. That's why I assumed you are showing a gate level netlist. 

 

If the clock source itself is stable, I'm sure that the reset synchronizer will solve the problem.  

 

I think, it's a simple and convincing example about the necessity of reset synchronizers.
0 Kudos
Altera_Forum
Honored Contributor II
2,082 Views

Hi All, 

 

For this problem I encountered, I tried two ways to solve it: 

 

  1. I don't use "shift register" at all. I use a two-bit counter instead. This counter resets to 0, and, at clock rising edge, increments to 1, 2, 3, and back to 0. See the RTL code below. Note that here the RST_N signal is NOT synchronized with ADC_CLK. I found this code works for me to solve my problem. But I think it is still better to use a synchronized reset signal. 

     

    In my original RTL, I used the 4-bit shift register as a kind of "one-hot" encoding, to generate a timing signal. Now I use this 2-bit counter as "binary" encoding, to generate the same timing signal.  

     

    I chose to replace the 4-bit shift register with the 2-bit counter, because I think this 2-bit counter is safe (because counter never gets stuck at a particular value). 

     

     

     

    process (ADC_CLK, RST_N) begin if RST_N = '0' then cnt2 <= 0; elsif rising_edge(ADC_CLK) then if (cnt2 = 3) then cnt2 <= 0; else cnt2 <= cnt2 + 1; end if; end if; end process; 

     

     

     

  2. The second method I tried is to keep the 4-bit shift register and use a synchronized reset signal. The way to synchronize I have explained in my previous reply. 

 

 

I have generated two bitmaps using each of the two methods, and tested using the same test FW that exposed this problem in the beginning. I see that both bitmaps can pass my test. 

 

 

 

Thank you for all your replies :)
0 Kudos
Reply