Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
21327 Discussions

Help with metastability problem

Altera_Forum
Honored Contributor II
3,263 Views

Hi, 

I'm working on a vision processing based project at uni using a custom board built around a cyclone III and I am having to modify some vhdl code written by previous years students.  

 

A method for transmitting data from the board had been previously implemented but is very static and hard to modify. So just to test my code I decided to 'Hijack' an area that already writes data out. 

 

in this area, data is written to an instantiated RAM block (of type altsyncram) acting as a buffer. When this has been filled a ready signal is activated and the contents of the RAM block is transmitted via an FTDI interface. 

 

So I setup a block that for the time being fills the RAM block with hardcoded values (2 values that alternate at each clock cycle) while a valid frame is being read from the camera and then when the frame is over, I set the ready signal to active high and trigger the writing process. 

 

the data I am sending is 48 bits and has the form 

8-bits : for a color label 

10-bits : for x1 coordinate 

10-bits : for y1 coordinate 

10-bits : for x2 coordinate 

10-bits : for y2 coordinate 

 

so I send the following hard coded alternating data (color_label, x1, y1, x2, y2) 

 

data1 : (4, 1, 1, 1, 1) 

data2 : (4, 1, 1, 1, 2) 

 

on receiving the data I get random values of either (4,1,1,1,0), (4,1,1,1,1), (4,1,1,1,2) or (4,1,1,1,3). This leads me to think that I am having a problem with metastability and I believe it has something to do with the RAM block (altsyncram) as if I just pass the values continuously to the uploader (bypassing the RAM) i get values as expected, however this is not a viable solution outside of test conditions. 

 

I have attached a picture of my block that is setting the hardcoded values and the RAM block I am writing to. 

 

The code of my block is as follows: 

 

-- INPUTS 

FVAL : indicates a valid frame from the camera 

DVAL : indicates valid data from the camera 

VALID_IN : indicates valid data into this block (currently unused) 

buffer_lock : indicates the data in the RAM is being uploaded, so can't write to RAM 

LINE_OBJ : the data to write out (currently unused, values are hardcoded for testing) 

-- OUTPUTS 

buffer_lock_out : used to block the data that use to be writing to the RAM (ignore this) 

buffer_rdy : the ready signal that starts the upload process 

wren : write enabled signal to the RAM block 

wr_addr : the address to write to RAM 

obj_count : the data count written to RAM (ignore this, for external purposes) 

wr_data : the data to write to RAM 

 

 

library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; use work.obj_extraction_pkg.all; entity OBJ_RamWriter is port( -- Clock Input CLK : in std_logic; -- Inputs buffer_lock : in std_logic := '0'; VALID_IN : in std_logic := '0'; DVAL_IN : in std_logic := '0'; FVAL_IN : in std_logic := '0'; LINE_OBJ : in std_logic_vector(obj_wd-addr_wd-1 downto 0); -- Outputs buffer_lock_out : out std_logic := '0'; buffer_rdy : out std_logic := '0'; wren : out std_logic := '0'; wr_addr : out unsigned(6 downto 0); obj_count : out unsigned(6 downto 0); wr_data : out std_logic_vector(obj_wd-addr_wd-1 downto 0) ); end entity; architecture rt1 of OBJ_RamWriter is -- Data Registers signal lock_reg : std_logic := '0'; signal v_reg : std_logic := '0'; signal fvalid_reg : std_logic := '0'; signal line_reg : std_logic_vector(obj_wd-addr_wd-1 downto 0); -- Internal States signal count : unsigned(6 downto 0) := to_unsigned(0,7); signal addr : unsigned(6 downto 0) := to_unsigned(0,7); signal rdy_reg : std_logic := '0'; signal rdy_reg_temp : std_logic := '0'; signal odd : std_logic := '0'; -- Output registers signal lock_out_reg : std_logic := '0'; signal valid_out_reg : std_logic := '0'; BEGIN process (CLK) BEGIN if (rising_edge(CLK)) then -- Store inputs lock_reg <= buffer_lock; v_reg <= '1';--VALID_IN; if (DVAL_IN = '1' and FVAL_IN = '1') then fvalid_reg <= '1'; else fvalid_reg <= '0'; end if; if (odd = '0') then odd <= '1'; line_reg <= std_logic_vector(to_unsigned(4, 8) & to_unsigned(1,10) & to_unsigned(1,10) & to_unsigned(1,10) & to_unsigned(1,10));--LINE_OBJ; --wr_data <= std_logic_vector(to_unsigned(4, 8) & to_unsigned(1,10) & to_unsigned(1,10) & to_unsigned(1,10) & to_unsigned(1,10));--LINE_OBJ; else odd <= '0'; line_reg <= std_logic_vector(to_unsigned(4, 8) & to_unsigned(1,10) & to_unsigned(1,10) & to_unsigned(1,10) & to_unsigned(2,10)); --wr_data <= std_logic_vector(to_unsigned(4, 8) & to_unsigned(2,10) & to_unsigned(2,10) & to_unsigned(2,10) & to_unsigned(2,10));--LINE_OBJ; end if; end if; end process; process (lock_reg, v_reg, fvalid_reg, line_reg) BEGIN if (lock_reg = '0') then -- Buffer not being read by uploader -- Prevents any other output but lines lock_out_reg <= '1'; if (fvalid_reg = '1') then -- Frame Data to be processed exists rdy_reg_temp <= '0'; if (rdy_reg = '1') then -- Buffer upload complete, reset addr <= to_unsigned(0,7); if (v_reg = '1') then -- Valid object ready to be written to buffer valid_out_reg <= '1'; count <= to_unsigned(1,7); else -- No valid object valid_out_reg <= '0'; count <= to_unsigned(0,7); end if; else -- Normal writting state if (v_reg = '1' and count < to_unsigned(127,7)) then -- Valid object ready to be written to buffer addr <= addr + 1; count <= count + 1; valid_out_reg <= '1'; else -- No valid object addr <= addr; count <= count; valid_out_reg <= '0'; end if; end if; else -- No valid frame data left, start upload rdy_reg_temp <= '1'; addr <= addr; count <= count; valid_out_reg <= '0'; end if; else -- Buffer being read by uploader lock_out_reg <= '1'; count <= count; addr <= addr; rdy_reg_temp <= '1'; valid_out_reg <= '0'; end if; end process; rdy_reg <= rdy_reg_temp; buffer_lock_out <= lock_out_reg; buffer_rdy <= rdy_reg_temp; wren <= valid_out_reg; wr_addr <= addr; obj_count <= count; wr_data <= line_reg; end rt1;  

 

I have a feeling that I may be violating the setup of hold times of the RAM block but I do not know how to verify this or how to fix it. Any ideas/suggestions would be greatly appreciated and I would be happy to provide any additional information. 

 

Thanks, 

Mat.
0 Kudos
15 Replies
Altera_Forum
Honored Contributor II
2,260 Views

Timing analysis is the way to find out about violations. 

 

I see that you are trying to operate counters based on latches (in an asynchronous process). This can't work and Quartus will be surely issuing a number of warnings related to the construct.  

addr <= addr + 1; count <= count + 1;  

 

So there are apparently more basic problems than possible timing violations.
0 Kudos
Altera_Forum
Honored Contributor II
2,260 Views

Hi, 

Thanks for the reply, I know there are possible many things wrong with my code, 

this is the first time I have ever used VHDL so it has been a bit of an experience to say the least. Quartus is spitting out warnings about inferred latches which I've read are a bad thing but i'm not sure how to design it any other way. Would you mind walking me through how you would go about designing this block (its a fairly simple block anyway) 

 

Basically what I am trying to make the block do is 

 

- Check if the frame data is valid by checking DVAL and FVAL 

(If the frame data is valid I am writing to the RAM block otherwise I am waiting while the data is uploaded) 

 

- Check if I am blocked (buffer_lock) and if so do nothing 

 

Writing state 

- if writing state just started (was previously uploading) reset the address and object count 

- Check if the obj data coming in is valid by checking VALID_IN 

- If data is valid write it to RAM and increment the address and object count 

- If RAM is full, the object count is equal to the RAM size do nothing 

 

Upload state 

- send out the object count and ready signal 

 

I'm also looking into timing analysis, as the output does seem like metastability 

but I do agree there are other problems with my code. 

 

Thanks, 

Mat.
0 Kudos
Altera_Forum
Honored Contributor II
2,260 Views

I suggest before you go back to synthesis or timing analysis, get yourself a testbench written and run the design through modelsim, eliminating all of the latches (ie. make sure things like counters are inside a synchronous process, not the async one).

0 Kudos
Altera_Forum
Honored Contributor II
2,260 Views

So I can remove all latches by moving the signals that store values inside the process that checks for the rising clock edge? I thought latches were inferred anytime a signal holds its current value.  

 

I have written a testbench and when simulating it in model sim it works as expected, but I will try to get rid of the counters and signals that maintain a value from the asynchronous process.  

 

Any other comments on my code to help me improve would be greatly appreciated. I am reading books on VHDL concurrently as well however I have to learn as fast as possible as the project is time limited. 

 

Thanks, 

Mat.
0 Kudos
Altera_Forum
Honored Contributor II
2,260 Views

latches are created when a value stores its value without using a clock. These are bad because they are prone to metastability, temperature and cannot be studied in timing analysis.

0 Kudos
Altera_Forum
Honored Contributor II
2,260 Views

An other issue are the signals DVAL_IN, FVAL_IN and buffer_lock. If they are not synchronous to the clock you should/must synchronize them. Otherwise you will run into metastability problems with your fvalid_reg and lock_reg signal.

0 Kudos
Altera_Forum
Honored Contributor II
2,260 Views

Thanks, 

DVAL and FVAL are external signals and are synchronized to the clock once at the input by passing through a clocked register. Is that enough or should that be re synchronized at certain points if they are used across large portions of the design. As far as buffer_lock i'm not 100% sure but i'll look into that. Thanks for the advice about inferred latches as well. 

 

my current design was partly due to the concurrent and delayed nature of signal assignments, i have been looking into using variables instead of some of the signals so that I can assign some things sequentially, is this a good or bad idea? 

 

Thanks, 

Mat.
0 Kudos
Altera_Forum
Honored Contributor II
2,260 Views

State-of-the-art is double-registering signals from unrelated clock domains. If you don't mind rare metastable events (the actual probability needs to be calculated), single registering can be O.K.  

 

Using variables for intermediate results in a synchronous process means to chain more logic elements and reduce maximum design speed. As long as you have sufficient timing margin, there's no problem involved.
0 Kudos
Altera_Forum
Honored Contributor II
2,260 Views

hey i've re-written my block and I don't get any more warnings from quartus. 

I've simulated the block in modelsim and it all seems to work as expected, although I'll double check the simulation as now no data is coming out from the board. 

 

Can anyone take a quick look at my new code and see if they can spot anything wrong? 

 

library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; use work.obj_extraction_pkg.all; entity OBJ_RamWriter is port( -- Clock Input CLK : in std_logic; -- Inputs buffer_lock : in std_logic := '0'; VALID_IN : in std_logic := '0'; DVAL_IN : in std_logic := '0'; FVAL_IN : in std_logic := '0'; LINE_OBJ : in std_logic_vector(obj_wd-addr_wd-1 downto 0); -- Outputs buffer_lock_out : out std_logic := '0'; buffer_rdy : out std_logic := '0'; wren : out std_logic := '0'; wr_addr : out unsigned(6 downto 0); obj_count : out unsigned(6 downto 0); wr_data : out std_logic_vector(obj_wd-addr_wd-1 downto 0) ); end entity; architecture rt1 of OBJ_RamWriter is -- Data Registers signal line_reg : std_logic_vector(obj_wd-addr_wd-1 downto 0); -- Internal States signal odd : std_logic := '0'; -- Output registers signal lock_out_reg : std_logic := '0'; signal valid_out_reg : std_logic := '0'; BEGIN process (CLK) variable addr : unsigned(6 downto 0) := to_unsigned(0,7); variable count : unsigned(6 downto 0) := to_unsigned(0,7); variable ready : std_logic := '0'; BEGIN if (rising_edge(CLK)) then -- Store inputs lock_out_reg <= buffer_lock; if (buffer_lock = '0') then valid_out_reg <= '1'; --VALID_IN; if (DVAL_IN = '1' and FVAL_IN = '1') then -- Frame valid (writing state) if (ready = '1') then -- Was previously in upload state, reset ready := '0'; if (VALID_IN = '1') then -- will be writing this cycle so initialise as such addr := to_unsigned(0,7); count := to_unsigned(1,7); else -- nothing to write this cycle initialise as such addr := b"1111111"; count := to_unsigned(0,7); end if; else -- not reset case ready := '0'; if (VALID_IN = '1') then -- writing this cycle addr := addr + 1; count := count + 1; else -- nothing to do addr := addr; count := count; end if; end if; else -- Uploading State ready := '1'; valid_out_reg <= '0'; addr := addr; count := count; end if; if (odd = '0') then odd <= '1'; line_reg <= std_logic_vector(to_unsigned(4, 8) & to_unsigned(1,10) & to_unsigned(1,10) & to_unsigned(1,10) & to_unsigned(1,10));--LINE_OBJ; else odd <= '0'; line_reg <= std_logic_vector(to_unsigned(4, 8) & to_unsigned(1,10) & to_unsigned(1,10) & to_unsigned(1,10) & to_unsigned(2,10)); end if; else -- Locked by buffer_lock keep same state valid_out_reg <= '0'; ready := ready; addr := addr; count := count; line_reg <= line_reg; odd <= odd; end if; end if; buffer_rdy <= ready; wr_addr <= addr; obj_count <= count; end process; buffer_lock_out <= lock_out_reg; wren <= valid_out_reg; wr_data <= line_reg; end rt1; library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; use work.obj_extraction_pkg.all; entity OBJRam_test is end OBJRam_test; architecture bench of OBJRam_test is component OBJ_RamWriter port ( -- Clock Input CLK : in std_logic; -- Inputs buffer_lock : in std_logic := '0'; VALID_IN : in std_logic := '0'; DVAL_IN : in std_logic := '0'; FVAL_IN : in std_logic := '0'; LINE_OBJ : in std_logic_vector(obj_wd-addr_wd-1 downto 0); -- Outputs buffer_lock_out : out std_logic := '0'; buffer_rdy : out std_logic := '0'; wren : out std_logic := '0'; wr_addr : out unsigned(6 downto 0); obj_count : out unsigned(6 downto 0); wr_data : out std_logic_vector(obj_wd-addr_wd-1 downto 0) ); end component; signal CLK, buffer_lock, VALID_IN, DVAL_IN, FVAL_IN, buffer_lock_out, buffer_rdy, wren : std_logic; signal wr_addr, obj_count : unsigned(6 downto 0); signal LINE_OBJ, wr_data : std_logic_vector(obj_wd-addr_wd-1 downto 0); BEGIN clk_process :process begin CLK <= '0'; wait for 100 PS; CLK <= '1'; wait for 100 PS; end process; stim_process :process begin VALID_IN <= '0'; FVAL_IN <= '1'; DVAL_IN <= '1'; buffer_lock <= '1'; LINE_OBJ <= b"000000000000000000000000000000000000000000000000"; wait for 150 ps; VALID_IN <= '0'; FVAL_IN <= '1'; DVAL_IN <= '1'; buffer_lock <= '0'; LINE_OBJ <= b"100000000000000000000000000000000000000000000000"; wait for 200 ps; VALID_IN <= '1'; FVAL_IN <= '1'; DVAL_IN <= '1'; buffer_lock <= '0'; LINE_OBJ <= b"010101010101010101010101010101010101010101010101"; wait for 200 ps; VALID_IN <= '1'; FVAL_IN <= '1'; DVAL_IN <= '1'; buffer_lock <= '0'; LINE_OBJ <= b"110101010101010101010101010101010101010101010101"; wait for 200 ps; VALID_IN <= '0'; FVAL_IN <= '1'; DVAL_IN <= '1'; buffer_lock <= '0'; LINE_OBJ <= b"111101010101010101010101010101010101010101010101"; wait for 200 ps; VALID_IN <= '1'; FVAL_IN <= '1'; DVAL_IN <= '1'; buffer_lock <= '0'; LINE_OBJ <= b"111111010101010101010101010101010101010101010101"; wait for 200 ps; VALID_IN <= '1'; FVAL_IN <= '0'; DVAL_IN <= '0'; buffer_lock <= '0'; LINE_OBJ <= b"111111110101010101010101010101010101010101010101"; wait for 200 ps; VALID_IN <= '0'; FVAL_IN <= '1'; DVAL_IN <= '1'; buffer_lock <= '0'; LINE_OBJ <= b"111111110101010101010101010101010101010101010101"; -- wait for 200 ps; wait; end process; M: OBJ_RamWriter port map (CLK, buffer_lock, VALID_IN, DVAL_IN, FVAL_IN, LINE_OBJ, buffer_lock_out, buffer_rdy, wren, wr_addr, obj_count, wr_data); end bench;  

 

Thanks, everyone's been really helpful tonight, 

Mat.
0 Kudos
Altera_Forum
Honored Contributor II
2,260 Views

Although not strictly a problem in RTL simulation, you would want to operate the testbench at a speed that can be processed by a real FPGA. 

 

I'm also not sure, if the design will keep up with the rapid input changes in terms of design clock cycles. But this can be more easily traced in simulation than in a code review.
0 Kudos
Altera_Forum
Honored Contributor II
2,260 Views

So I should run the simulation with my clock speed at frequency of the FPGA and that should let me know if its will run at those speeds? or will I have to do some other timing analysis as well? 

 

# Edit# 

 

I tried the simulation, the current clock of the system is 36.15 MHz because the clock is synchronized with the clock from the camera and that is the clock for the camera. So my previous simulations were operating at clock speeds much faster however I adjusted my clock speed used in simulation to be 

 

(1/36150000) / 2 ~= 14 ns between every edge (rising and falling) 

 

and everything works as expected in simulation. Is this enough for timing anyalysis or do I need to do more? 

 

Thanks, 

Mat.
0 Kudos
Altera_Forum
Honored Contributor II
2,260 Views

If you are doing a functional simulation (ie, just testing your code to make sure it works) clock speeds are mostly unimportant. I will generally just use a 100 MHz clock regardless of my final clock because it's easier to work out how many clock cycles have occured between two points when I put two cursers up.  

 

BUT. If you have more than 1 clock in the system, it is very important to try and get the ratios of the two clocks as close to the real ratios as possible, to ensure data rates are correct and fifos etc dont overfill. 

 

If you are doing a gate level simulation, then yes, you need to use the real clock speeds, as this should point out any timing problems. But usually most problems are picked up at the functional stage after which you move into synthesis and timing analysis.
0 Kudos
Altera_Forum
Honored Contributor II
2,260 Views

So what would my next best step be? 

Should I try gate level simulation, or should I move onto synthesis and timing analysis with my new design? I never done either gate level simulations or timing analysis before so a point in the right direction would also be appreciated :) 

 

Thanks, 

Mat.
0 Kudos
Altera_Forum
Honored Contributor II
2,260 Views

I honestly have never done a post P&R simulation. With good design practice, a good testbench and good timing analysis specs, you shouldnt need to do one with a fully synchronous design. 

 

The gate level sim is only really needed when you need to test external interfaces or where you have asynchronous logic. A fully synchronised design shouldnt normally need a gate level sim.
0 Kudos
Altera_Forum
Honored Contributor II
2,260 Views

Thanks for all the help. I haven't been able to get anything out from the board but I'm meeting with a lecturer that knows vhdl. So hopefully he'll be able to help as he can take a look at the actual system. 

Thanks again to everyone for helping me out. 

Mat.
0 Kudos
Reply