Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)

Design is too slow

Altera_Forum
Honored Contributor II
1,467 Views

I'm trying to build a frame alignment component for an OTN (Optical Transport Network) with Quartus II on a Cyclone II FPGA and my problem is that I cannot reach the desired frequency. 

Obviously I have to do some pipelining but I do not know what and where to pipeline. The code follows in VHDL. As you will see the code consists of two processes, a clocked process and a combinatorical one. Any kind of help would be mostly appreciated. 

 

library ieee;use IEEE.std_logic_1164.all; use IEEE.std_logic_unsigned.all; entity framer is generic (DATA_WIDTH :integer := 256); port (clk : in std_logic; reset : in std_logic; input : in std_logic_vector(255 downto 0); output : out std_logic_vector(255 downto 0)); end framer; architecture mixed of framer is type state_type is (Frame_Search, Prove_In_Sync, In_Frame_Sync, Prove_Out_Sync); signal next_state, current_state : state_type; --signal cyclecounter : std_logic_vector(8 downto 0); signal cyclecounter : integer range 510 downto 0; signal cycle_inc : std_logic; signal error_inc : std_logic; --signal errorcount : std_logic_vector(3 downto 0); signal errorcount : integer range 4 downto 0; signal w_en : std_logic; signal FAOOF : std_logic; signal index, prev_index : natural; signal input_reg1,input_reg2 : std_logic_vector(DATA_WIDTH-1 downto 0); signal output_reg : std_logic_vector(DATA_WIDTH-1 downto 0); signal joint : std_logic_vector(2*DATA_WIDTH-1 downto 0); signal sync_regB : std_logic_vector(31 downto 0); signal sync_regS : std_logic_vector(23 downto 0); constant Fr_find : std_logic_vector(31 downto 0) := X"F6F6_2828"; constant Fr_lose : std_logic_vector(23 downto 0) := X"F62828"; begin state_update : process (clk,reset) begin if reset = '1' then current_state <= Frame_Search after 1 ns; --default state upon reset cyclecounter <= 0 after 1 ns; errorcount <= 0 after 1 ns; elsif rising_edge(clk) then current_state <= next_state; --update state input_reg1 <= input after 1 ns; input_reg2 <= input_reg1 after 1 ns; prev_index <= index; if w_en = '1' then for i in joint'left-8 downto DATA_WIDTH loop if i = index then output_reg <= joint(i + 8 downto i + 8 - DATA_WIDTH + 1) after 1 ns; end if; end loop; end if; if FAOOF = '0' then output <= output_reg; else output <= (others=>'0'); end if; if cycle_inc = '1' then cyclecounter <= cyclecounter + 1 after 1 ns; else cyclecounter <= 0 after 1 ns; end if; if cyclecounter = 509 then if error_inc = '1' then errorcount <= errorcount + 1 after 1 ns; else errorcount <= 0 after 1 ns; end if; end if; end if; end process; joint <= input_reg2 & input_reg1; state_machine : process (current_state, joint, output_reg, cycle_inc, error_inc, index, prev_index, cyclecounter, errorcount) begin case current_state is when Frame_Search => FAOOF <= '1'; index <= 0; cycle_inc <= '0';--counter reset error_inc <= '0';--counter reset w_en <= '0'; next_state <= current_state; for j in joint'left-8 downto DATA_WIDTH loop if joint(j downto j - (Fr_find'length-1)) = Fr_find then w_en <= '1'; index <= j; next_state <= Prove_In_Sync; end if; end loop; when Prove_In_Sync => w_en <= '1'; FAOOF <= '1'; index <= prev_index; cycle_inc <= '1'; error_inc <= '0'; next_state <= current_state; if cyclecounter = 510 then cycle_inc <= '0';--counter reset --for k in joint'left-8 downto DATA_WIDTH loop --if k = index then --if joint(k downto k - (Fr_find'length-1)) = Fr_find then if output_reg(output_reg'left - 8 downto output_reg'left - 8 - (Fr_find'length-1)) = Fr_find then next_state <= In_Frame_Sync; FAOOF <= '0'; --cycle_inc <= '0';--counter reset --error_inc <= '0';--counter reset else next_state <= Frame_Search; end if; --end if; --end loop; else cycle_inc <= '1'; end if; when In_Frame_Sync => w_en <= '1'; FAOOF <= '0'; index <= prev_index; cycle_inc <= '1'; error_inc <= '0'; next_state <= current_state; if cyclecounter = 509 then cycle_inc <= '0';--counter reset --for l in joint'left-8 downto DATA_WIDTH loop --if l = index then --if joint(l - 8 downto (l - 8) - (Fr_lose'length - 1)) = Fr_lose then if output_reg(output_reg'left - 16 downto output_reg'left - 16 - (Fr_lose'length-1)) = Fr_lose then next_state <= current_state; else error_inc <= '1'; next_state <= Prove_Out_Sync; end if; --end if; --end loop; else cycle_inc <= '1'; end if; when Prove_Out_Sync=> w_en <= '1'; FAOOF <= '0'; index <= prev_index; cycle_inc <= '1'; error_inc <= '1'; next_state <= current_state; if cyclecounter = 509 then cycle_inc <= '0';--counter reset --for m in joint'left-8 downto DATA_WIDTH loop --if m = index then --if joint(m - 8 downto (m - 8) - (Fr_lose'length - 1)) = Fr_lose then if output_reg(output_reg'left - 16 downto output_reg'left- 16 - (Fr_lose'length-1)) = Fr_lose then next_state <= In_Frame_Sync; --cycle_inc <= '0';--counter reset error_inc <= '0';--counter reset else error_inc <= '1'; if errorcount = 4 then error_inc <= '0';--counter reset next_state <= Frame_Search; else next_state <= current_state; end if; end if; --end if; --end loop; else cycle_inc <= '1'; end if; end case; end process; end mixed;
0 Kudos
6 Replies
Altera_Forum
Honored Contributor II
598 Views

Your second process is heavily combinatorial and is unlikely to pass timing. I personally use one process state machine always. That way I declare one state signal of type say s0,s1,s2,s3,s4, ...etc. and assign next state as follows: if...state <= s0, if... state <= s1 and so on. The change of state is always on the clock edge so are any conditions and assignments. To keep functionality you will need to redesign for latency effect.

0 Kudos
Altera_Forum
Honored Contributor II
598 Views

 

--- Quote Start ---  

Your second process is heavily combinatorial and is unlikely to pass timing. I personally use one process state machine always. That way I declare one state signal of type say s0,s1,s2,s3,s4, ...etc. and assign next state as follows: if...state <= s0, if... state <= s1 and so on. The change of state is always on the clock edge so are any conditions and assignments. To keep functionality you will need to redesign for latency effect. 

--- Quote End ---  

 

 

Using Timing Analysis I found that the critical path is inside the clock process where i store the result in the output register. 

if w_en = '1' then for i in joint'left-8 downto DATA_WIDTH loop if i = index then output_reg <= joint(i + 8 downto i + 8 - DATA_WIDTH + 1) after 1 ns; end if; end loop; end if;  

 

This design requires to have 2 processes at least to work properly and comply with the ITU-T G.709, G798 OTN specifications.
0 Kudos
Altera_Forum
Honored Contributor II
598 Views

Does the spec actually say in the specification "You MUST use two processes" or is it just that you cant get it to work with a single process state machine? I would be very surprised if it was the former. 

 

The problem comes, I assume, because of the massive long combinatorial path of index, that forms part of the write-enable input on the output_reg register. index is muxed based on a comparator and then muxed again and anded with w_en.  

 

What you havent said is what the target FMax is and what the actual FMax is. You may have to just accept that your design isnt good enough or re-design it with more pipelining.
0 Kudos
Altera_Forum
Honored Contributor II
598 Views

kaz:  

--- Quote Start ---  

Your second process is heavily combinatorial and is unlikely to pass timing. I personally use one process state machine always. That way I declare one state signal of type say s0,s1,s2,s3,s4, ...etc. and assign next state as follows: if...state <= s0, if... state <= s1 and so on. The change of state is always on the clock edge so are any conditions and assignments. To keep functionality you will need to redesign for latency effect. 

--- Quote End ---  

 

tricky: 

--- Quote Start ---  

 

Does the spec actually say in the specification "You MUST use two processes" or is it just that you cant get it to work with a single process state machine? I would be very surprised if it was the former. 

 

--- Quote End ---  

 

This shows a bias against two-process state-machines, claiming one-process states-machines are better which is without any proof or justification. A two-process state-machine does perform equally as a one-process state-machine. A two-process state machine allows you to generate additional combinatorial outputs, without having to resort to a additional combinatorial process to decode these outputs, or having to "redesign for latency effect". 

 

The correct advice is to look at the critical paths in TimeQuest, as hargeo effectively did, and rework the logic.
0 Kudos
Altera_Forum
Honored Contributor II
598 Views

The idea of one or more process per se is irrelevant. You may use as many as you want. The issue is avoiding long comb paths. 

The timing does not refer to process boundaries but to paths.  

In your case I realised "joint" is assigned to output_register from input through reg1 and reg2 directly and in this case yes you are right but it could be the long comb paths are contributing to routing difficulty. 

 

edit: 

note that output_register is assigned conditional on wr_en and FA00F which both come from depth of three levels of if statements in the second process. That is causing your violation I believe.
0 Kudos
Altera_Forum
Honored Contributor II
598 Views

By making a few changes like using std_logic_vectors with less bits where possible, prefering ranges that are multiple of 2 but still using 2 processes, I managed to increase the Fmax dramatically. 

if w_en = '1' then for step in 1 downto 0 loop for i in DATA_WIDTH/2-1 downto 0 loop if i = prev_index(6 downto 0) then temp_reg1 <= joint(i + DATA_WIDTH downto i + 1) after 1 ns; temp_reg2 <= joint(i + DATA_WIDTH*3/2 downto i + DATA_WIDTH/2 + 1) after 1 ns; end if; end loop; end loop; if prev_index(7) = '0' then output_reg <= temp_reg1 after 1 ns; else output_reg <= temp_reg2 after 1 ns; end if; end if;  

 

I hope that this pipelining is appropriate. Could anyone indicate a correction or improvement?
0 Kudos
Reply