- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm working on a vision processing based project at uni using a custom board built around a cyclone III and I am having to modify some vhdl code written by previous years students. A method for transmitting data from the board had been previously implemented but is very static and hard to modify. So just to test my code I decided to 'Hijack' an area that already writes data out. in this area, data is written to an instantiated RAM block (of type altsyncram) acting as a buffer. When this has been filled a ready signal is activated and the contents of the RAM block is transmitted via an FTDI interface. So I setup a block that for the time being fills the RAM block with hardcoded values (2 values that alternate at each clock cycle) while a valid frame is being read from the camera and then when the frame is over, I set the ready signal to active high and trigger the writing process. the data I am sending is 48 bits and has the form 8-bits : for a color label 10-bits : for x1 coordinate 10-bits : for y1 coordinate 10-bits : for x2 coordinate 10-bits : for y2 coordinate so I send the following hard coded alternating data (color_label, x1, y1, x2, y2) data1 : (4, 1, 1, 1, 1) data2 : (4, 1, 1, 1, 2) on receiving the data I get random values of either (4,1,1,1,0), (4,1,1,1,1), (4,1,1,1,2) or (4,1,1,1,3). This leads me to think that I am having a problem with metastability and I believe it has something to do with the RAM block (altsyncram) as if I just pass the values continuously to the uploader (bypassing the RAM) i get values as expected, however this is not a viable solution outside of test conditions. I have attached a picture of my block that is setting the hardcoded values and the RAM block I am writing to. The code of my block is as follows: -- INPUTS FVAL : indicates a valid frame from the camera DVAL : indicates valid data from the camera VALID_IN : indicates valid data into this block (currently unused) buffer_lock : indicates the data in the RAM is being uploaded, so can't write to RAM LINE_OBJ : the data to write out (currently unused, values are hardcoded for testing) -- OUTPUTS buffer_lock_out : used to block the data that use to be writing to the RAM (ignore this) buffer_rdy : the ready signal that starts the upload process wren : write enabled signal to the RAM block wr_addr : the address to write to RAM obj_count : the data count written to RAM (ignore this, for external purposes) wr_data : the data to write to RAMlibrary ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.obj_extraction_pkg.all;
entity OBJ_RamWriter is
port(
-- Clock Input
CLK : in std_logic;
-- Inputs
buffer_lock : in std_logic := '0';
VALID_IN : in std_logic := '0';
DVAL_IN : in std_logic := '0';
FVAL_IN : in std_logic := '0';
LINE_OBJ : in std_logic_vector(obj_wd-addr_wd-1 downto 0);
-- Outputs
buffer_lock_out : out std_logic := '0';
buffer_rdy : out std_logic := '0';
wren : out std_logic := '0';
wr_addr : out unsigned(6 downto 0);
obj_count : out unsigned(6 downto 0);
wr_data : out std_logic_vector(obj_wd-addr_wd-1 downto 0)
);
end entity;
architecture rt1 of OBJ_RamWriter is
-- Data Registers
signal lock_reg : std_logic := '0';
signal v_reg : std_logic := '0';
signal fvalid_reg : std_logic := '0';
signal line_reg : std_logic_vector(obj_wd-addr_wd-1 downto 0);
-- Internal States
signal count : unsigned(6 downto 0) := to_unsigned(0,7);
signal addr : unsigned(6 downto 0) := to_unsigned(0,7);
signal rdy_reg : std_logic := '0';
signal rdy_reg_temp : std_logic := '0';
signal odd : std_logic := '0';
-- Output registers
signal lock_out_reg : std_logic := '0';
signal valid_out_reg : std_logic := '0';
BEGIN
process (CLK)
BEGIN
if (rising_edge(CLK)) then
-- Store inputs
lock_reg <= buffer_lock;
v_reg <= '1';--VALID_IN;
if (DVAL_IN = '1' and FVAL_IN = '1') then
fvalid_reg <= '1';
else
fvalid_reg <= '0';
end if;
if (odd = '0') then
odd <= '1';
line_reg <= std_logic_vector(to_unsigned(4, 8) & to_unsigned(1,10) & to_unsigned(1,10) & to_unsigned(1,10) & to_unsigned(1,10));--LINE_OBJ;
--wr_data <= std_logic_vector(to_unsigned(4, 8) & to_unsigned(1,10) & to_unsigned(1,10) & to_unsigned(1,10) & to_unsigned(1,10));--LINE_OBJ;
else
odd <= '0';
line_reg <= std_logic_vector(to_unsigned(4, 8) & to_unsigned(1,10) & to_unsigned(1,10) & to_unsigned(1,10) & to_unsigned(2,10));
--wr_data <= std_logic_vector(to_unsigned(4, 8) & to_unsigned(2,10) & to_unsigned(2,10) & to_unsigned(2,10) & to_unsigned(2,10));--LINE_OBJ;
end if;
end if;
end process;
process (lock_reg, v_reg, fvalid_reg, line_reg)
BEGIN
if (lock_reg = '0') then -- Buffer not being read by uploader
-- Prevents any other output but lines
lock_out_reg <= '1';
if (fvalid_reg = '1') then -- Frame Data to be processed exists
rdy_reg_temp <= '0';
if (rdy_reg = '1') then -- Buffer upload complete, reset
addr <= to_unsigned(0,7);
if (v_reg = '1') then -- Valid object ready to be written to buffer
valid_out_reg <= '1';
count <= to_unsigned(1,7);
else -- No valid object
valid_out_reg <= '0';
count <= to_unsigned(0,7);
end if;
else -- Normal writting state
if (v_reg = '1' and count < to_unsigned(127,7)) then -- Valid object ready to be written to buffer
addr <= addr + 1;
count <= count + 1;
valid_out_reg <= '1';
else -- No valid object
addr <= addr;
count <= count;
valid_out_reg <= '0';
end if;
end if;
else -- No valid frame data left, start upload
rdy_reg_temp <= '1';
addr <= addr;
count <= count;
valid_out_reg <= '0';
end if;
else -- Buffer being read by uploader
lock_out_reg <= '1';
count <= count;
addr <= addr;
rdy_reg_temp <= '1';
valid_out_reg <= '0';
end if;
end process;
rdy_reg <= rdy_reg_temp;
buffer_lock_out <= lock_out_reg;
buffer_rdy <= rdy_reg_temp;
wren <= valid_out_reg;
wr_addr <= addr;
obj_count <= count;
wr_data <= line_reg;
end rt1;
I have a feeling that I may be violating the setup of hold times of the RAM block but I do not know how to verify this or how to fix it. Any ideas/suggestions would be greatly appreciated and I would be happy to provide any additional information. Thanks, Mat.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Timing analysis is the way to find out about violations.
I see that you are trying to operate counters based on latches (in an asynchronous process). This can't work and Quartus will be surely issuing a number of warnings related to the construct.addr <= addr + 1;
count <= count + 1;
So there are apparently more basic problems than possible timing violations.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for the reply, I know there are possible many things wrong with my code, this is the first time I have ever used VHDL so it has been a bit of an experience to say the least. Quartus is spitting out warnings about inferred latches which I've read are a bad thing but i'm not sure how to design it any other way. Would you mind walking me through how you would go about designing this block (its a fairly simple block anyway) Basically what I am trying to make the block do is - Check if the frame data is valid by checking DVAL and FVAL (If the frame data is valid I am writing to the RAM block otherwise I am waiting while the data is uploaded) - Check if I am blocked (buffer_lock) and if so do nothing Writing state - if writing state just started (was previously uploading) reset the address and object count - Check if the obj data coming in is valid by checking VALID_IN - If data is valid write it to RAM and increment the address and object count - If RAM is full, the object count is equal to the RAM size do nothing Upload state - send out the object count and ready signal I'm also looking into timing analysis, as the output does seem like metastability but I do agree there are other problems with my code. Thanks, Mat.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suggest before you go back to synthesis or timing analysis, get yourself a testbench written and run the design through modelsim, eliminating all of the latches (ie. make sure things like counters are inside a synchronous process, not the async one).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So I can remove all latches by moving the signals that store values inside the process that checks for the rising clock edge? I thought latches were inferred anytime a signal holds its current value.
I have written a testbench and when simulating it in model sim it works as expected, but I will try to get rid of the counters and signals that maintain a value from the asynchronous process. Any other comments on my code to help me improve would be greatly appreciated. I am reading books on VHDL concurrently as well however I have to learn as fast as possible as the project is time limited. Thanks, Mat.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
latches are created when a value stores its value without using a clock. These are bad because they are prone to metastability, temperature and cannot be studied in timing analysis.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
An other issue are the signals DVAL_IN, FVAL_IN and buffer_lock. If they are not synchronous to the clock you should/must synchronize them. Otherwise you will run into metastability problems with your fvalid_reg and lock_reg signal.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks,
DVAL and FVAL are external signals and are synchronized to the clock once at the input by passing through a clocked register. Is that enough or should that be re synchronized at certain points if they are used across large portions of the design. As far as buffer_lock i'm not 100% sure but i'll look into that. Thanks for the advice about inferred latches as well. my current design was partly due to the concurrent and delayed nature of signal assignments, i have been looking into using variables instead of some of the signals so that I can assign some things sequentially, is this a good or bad idea? Thanks, Mat.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
State-of-the-art is double-registering signals from unrelated clock domains. If you don't mind rare metastable events (the actual probability needs to be calculated), single registering can be O.K.
Using variables for intermediate results in a synchronous process means to chain more logic elements and reduce maximum design speed. As long as you have sufficient timing margin, there's no problem involved.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hey i've re-written my block and I don't get any more warnings from quartus.
I've simulated the block in modelsim and it all seems to work as expected, although I'll double check the simulation as now no data is coming out from the board. Can anyone take a quick look at my new code and see if they can spot anything wrong?library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.obj_extraction_pkg.all;
entity OBJ_RamWriter is
port(
-- Clock Input
CLK : in std_logic;
-- Inputs
buffer_lock : in std_logic := '0';
VALID_IN : in std_logic := '0';
DVAL_IN : in std_logic := '0';
FVAL_IN : in std_logic := '0';
LINE_OBJ : in std_logic_vector(obj_wd-addr_wd-1 downto 0);
-- Outputs
buffer_lock_out : out std_logic := '0';
buffer_rdy : out std_logic := '0';
wren : out std_logic := '0';
wr_addr : out unsigned(6 downto 0);
obj_count : out unsigned(6 downto 0);
wr_data : out std_logic_vector(obj_wd-addr_wd-1 downto 0)
);
end entity;
architecture rt1 of OBJ_RamWriter is
-- Data Registers
signal line_reg : std_logic_vector(obj_wd-addr_wd-1 downto 0);
-- Internal States
signal odd : std_logic := '0';
-- Output registers
signal lock_out_reg : std_logic := '0';
signal valid_out_reg : std_logic := '0';
BEGIN
process (CLK)
variable addr : unsigned(6 downto 0) := to_unsigned(0,7);
variable count : unsigned(6 downto 0) := to_unsigned(0,7);
variable ready : std_logic := '0';
BEGIN
if (rising_edge(CLK)) then
-- Store inputs
lock_out_reg <= buffer_lock;
if (buffer_lock = '0') then
valid_out_reg <= '1'; --VALID_IN;
if (DVAL_IN = '1' and FVAL_IN = '1') then -- Frame valid (writing state)
if (ready = '1') then -- Was previously in upload state, reset
ready := '0';
if (VALID_IN = '1') then -- will be writing this cycle so initialise as such
addr := to_unsigned(0,7);
count := to_unsigned(1,7);
else -- nothing to write this cycle initialise as such
addr := b"1111111";
count := to_unsigned(0,7);
end if;
else -- not reset case
ready := '0';
if (VALID_IN = '1') then -- writing this cycle
addr := addr + 1;
count := count + 1;
else -- nothing to do
addr := addr;
count := count;
end if;
end if;
else -- Uploading State
ready := '1';
valid_out_reg <= '0';
addr := addr;
count := count;
end if;
if (odd = '0') then
odd <= '1';
line_reg <= std_logic_vector(to_unsigned(4, 8) & to_unsigned(1,10) & to_unsigned(1,10) & to_unsigned(1,10) & to_unsigned(1,10));--LINE_OBJ;
else
odd <= '0';
line_reg <= std_logic_vector(to_unsigned(4, 8) & to_unsigned(1,10) & to_unsigned(1,10) & to_unsigned(1,10) & to_unsigned(2,10));
end if;
else -- Locked by buffer_lock keep same state
valid_out_reg <= '0';
ready := ready;
addr := addr;
count := count;
line_reg <= line_reg;
odd <= odd;
end if;
end if;
buffer_rdy <= ready;
wr_addr <= addr;
obj_count <= count;
end process;
buffer_lock_out <= lock_out_reg;
wren <= valid_out_reg;
wr_data <= line_reg;
end rt1;
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.obj_extraction_pkg.all;
entity OBJRam_test is
end OBJRam_test;
architecture bench of OBJRam_test is
component OBJ_RamWriter
port (
-- Clock Input
CLK : in std_logic;
-- Inputs
buffer_lock : in std_logic := '0';
VALID_IN : in std_logic := '0';
DVAL_IN : in std_logic := '0';
FVAL_IN : in std_logic := '0';
LINE_OBJ : in std_logic_vector(obj_wd-addr_wd-1 downto 0);
-- Outputs
buffer_lock_out : out std_logic := '0';
buffer_rdy : out std_logic := '0';
wren : out std_logic := '0';
wr_addr : out unsigned(6 downto 0);
obj_count : out unsigned(6 downto 0);
wr_data : out std_logic_vector(obj_wd-addr_wd-1 downto 0)
);
end component;
signal CLK, buffer_lock, VALID_IN, DVAL_IN, FVAL_IN, buffer_lock_out, buffer_rdy, wren : std_logic;
signal wr_addr, obj_count : unsigned(6 downto 0);
signal LINE_OBJ, wr_data : std_logic_vector(obj_wd-addr_wd-1 downto 0);
BEGIN
clk_process :process
begin
CLK <= '0';
wait for 100 PS;
CLK <= '1';
wait for 100 PS;
end process;
stim_process :process
begin
VALID_IN <= '0';
FVAL_IN <= '1';
DVAL_IN <= '1';
buffer_lock <= '1';
LINE_OBJ <= b"000000000000000000000000000000000000000000000000";
wait for 150 ps;
VALID_IN <= '0';
FVAL_IN <= '1';
DVAL_IN <= '1';
buffer_lock <= '0';
LINE_OBJ <= b"100000000000000000000000000000000000000000000000";
wait for 200 ps;
VALID_IN <= '1';
FVAL_IN <= '1';
DVAL_IN <= '1';
buffer_lock <= '0';
LINE_OBJ <= b"010101010101010101010101010101010101010101010101";
wait for 200 ps;
VALID_IN <= '1';
FVAL_IN <= '1';
DVAL_IN <= '1';
buffer_lock <= '0';
LINE_OBJ <= b"110101010101010101010101010101010101010101010101";
wait for 200 ps;
VALID_IN <= '0';
FVAL_IN <= '1';
DVAL_IN <= '1';
buffer_lock <= '0';
LINE_OBJ <= b"111101010101010101010101010101010101010101010101";
wait for 200 ps;
VALID_IN <= '1';
FVAL_IN <= '1';
DVAL_IN <= '1';
buffer_lock <= '0';
LINE_OBJ <= b"111111010101010101010101010101010101010101010101";
wait for 200 ps;
VALID_IN <= '1';
FVAL_IN <= '0';
DVAL_IN <= '0';
buffer_lock <= '0';
LINE_OBJ <= b"111111110101010101010101010101010101010101010101";
wait for 200 ps;
VALID_IN <= '0';
FVAL_IN <= '1';
DVAL_IN <= '1';
buffer_lock <= '0';
LINE_OBJ <= b"111111110101010101010101010101010101010101010101";
-- wait for 200 ps;
wait;
end process;
M: OBJ_RamWriter port map (CLK, buffer_lock, VALID_IN, DVAL_IN, FVAL_IN, LINE_OBJ, buffer_lock_out, buffer_rdy, wren, wr_addr, obj_count, wr_data);
end bench;
Thanks, everyone's been really helpful tonight, Mat.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Although not strictly a problem in RTL simulation, you would want to operate the testbench at a speed that can be processed by a real FPGA.
I'm also not sure, if the design will keep up with the rapid input changes in terms of design clock cycles. But this can be more easily traced in simulation than in a code review.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So I should run the simulation with my clock speed at frequency of the FPGA and that should let me know if its will run at those speeds? or will I have to do some other timing analysis as well?
# Edit# I tried the simulation, the current clock of the system is 36.15 MHz because the clock is synchronized with the clock from the camera and that is the clock for the camera. So my previous simulations were operating at clock speeds much faster however I adjusted my clock speed used in simulation to be (1/36150000) / 2 ~= 14 ns between every edge (rising and falling) and everything works as expected in simulation. Is this enough for timing anyalysis or do I need to do more? Thanks, Mat.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are doing a functional simulation (ie, just testing your code to make sure it works) clock speeds are mostly unimportant. I will generally just use a 100 MHz clock regardless of my final clock because it's easier to work out how many clock cycles have occured between two points when I put two cursers up.
BUT. If you have more than 1 clock in the system, it is very important to try and get the ratios of the two clocks as close to the real ratios as possible, to ensure data rates are correct and fifos etc dont overfill. If you are doing a gate level simulation, then yes, you need to use the real clock speeds, as this should point out any timing problems. But usually most problems are picked up at the functional stage after which you move into synthesis and timing analysis.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So what would my next best step be?
Should I try gate level simulation, or should I move onto synthesis and timing analysis with my new design? I never done either gate level simulations or timing analysis before so a point in the right direction would also be appreciated :) Thanks, Mat.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I honestly have never done a post P&R simulation. With good design practice, a good testbench and good timing analysis specs, you shouldnt need to do one with a fully synchronous design.
The gate level sim is only really needed when you need to test external interfaces or where you have asynchronous logic. A fully synchronised design shouldnt normally need a gate level sim.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for all the help. I haven't been able to get anything out from the board but I'm meeting with a lecturer that knows vhdl. So hopefully he'll be able to help as he can take a look at the actual system.
Thanks again to everyone for helping me out. Mat.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page