Using LPM DIVIDE inside VHDL Block

Altera_Forum · ‎09-22-2012

Hi,

I am currently trying to implement Color Constancy algorithm for image processing. The algorithm requires at least one division operation (NUMER: 40 bits and DENOM: 40 bits). After scouting relevant posts, it seems that LPM_DIVIDE megafunction suits my requirement.

However, I am not entirely sure in implementing the divide function inside my block as well as when to expect the correct output (Note: I am very new to VHDL and FPGAs). Kindly advise on any corrections required.

With my code, I obtained these errors using LPM_DIVIDE:

** Error: /home/robocup/Desktop/Old Board Apply Color Constancy/ColorConstancy.vhd(213): Illegal sequential statement.

** Error: /home/robocup/Desktop/Old Board Apply Color Constancy/ColorConstancy.vhd(231): VHDL Compiler exiting

Also, it is very likely that 40bits/40bits result might not be obtained within a single clock - which is fine as long as I can implement a checker. Any suggestion on this part?

Vincent

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
library lpm;
use lpm.lpm_components.all;
-- This entity is to calculate normaliser at each frame.
entity ColorConstancy is
	generic(
		DATA_WIDTH	: integer := 10;
		COORD_WIDTH	: integer := 10;
		RGB_WIDTH	: integer := 40
	);
	
	port 
	(
		-- Clock signal
		CLK		: in std_logic;
		
		-- Input signals
		R_IN		: in unsigned (DATA_WIDTH-1 downto 0);
		G_IN		: in unsigned (DATA_WIDTH-1 downto 0);
		B_IN		: in unsigned (DATA_WIDTH-1 downto 0);
		DVAL		: in std_logic;
		FVAL		: in std_logic;
		
		-- Output signals
		R_DYN		: out unsigned (RGB_WIDTH-1 downto 0);
		G_DYN		: out unsigned (RGB_WIDTH-1 downto 0);
		B_DYN		: out unsigned (RGB_WIDTH-1 downto 0);
		NORM		: out unsigned (RGB_WIDTH-1 downto 0)
	);
				
end entity;
architecture rt1 of ColorConstancy is
  component LPM_DIVIDE
    generic ( LPM_WIDTHN : natural;    -- MUST be greater than 0
              LPM_WIDTHD : natural;    -- MUST be greater than 0
              
              LPM_NREPRESENTATION : string := "UNSIGNED";
              LPM_DREPRESENTATION : string := "UNSIGNED";
              
              LPM_PIPELINE : natural := 0;
              LPM_TYPE : string := L_DIVIDE;
              LPM_HINT : string := "UNUSED"
              );
    port (NUMER     : in std_logic_vector(LPM_WIDTHN-1 downto 0);
          DENOM     : in std_logic_vector(LPM_WIDTHD-1 downto 0);
          ACLR      : in std_logic := '0';
          CLOCK     : in std_logic := '0';
          CLKEN     : in std_logic := '1';
          QUOTIENT  : out std_logic_vector(LPM_WIDTHN-1 downto 0);
          REMAIN    : out std_logic_vector(LPM_WIDTHD-1 downto 0)
          );
  end component LPM_DIVIDE;
	
	-- Various states
	type state_t is (
	   first_frame,
	   wait_for_new_frame,
	   wait_end_of_frame,
	   calc_norm,
	   calc_new_RGB,
	   calc_RGB_dyn  
	 );
	
	-- Current state
	signal state : state_t;
		
	-- RGB sums
	signal R_sum, G_sum, B_sum						           : unsigned (RGB_WIDTH-1 downto 0);
	signal R_sum_temp, G_sum_temp, B_sum_temp	 : unsigned (RGB_WIDTH-1 downto 0);
	-- RGB normaliser
	signal normaliser		: unsigned (RGB_WIDTH-1 downto 0);
	-- Colour Constancy variables
	signal R_thresh			: unsigned (RGB_WIDTH-1 downto 0);
	signal G_thresh			: unsigned (RGB_WIDTH-1 downto 0);
	signal B_thresh			: unsigned (RGB_WIDTH-1 downto 0);
	
	-- LPM DIVIDER
	signal R_dyn_thresh   : std_logic_vector (RGB_WIDTH-1 downto 0);
	
begin
	
	process (CLK)
	BEGIN
	if (rising_edge(CLK)) then
    
		if (DVAL = '1' and FVAL = '1') then
			-- Next pixel, same frame
			R_sum <= ( R_sum + resize(R_IN, RGB_width) );
			G_sum <= ( G_sum + resize(G_IN, RGB_width) );
			B_sum <= ( B_sum + resize(B_IN, RGB_width) );			
		end if;
		
		case state is
		
		when first_frame =>
		  if (FVAL = '1') then	  
		    state <= wait_for_new_frame;
		  end if;
		  
		  -- Initialise
		  R_sum <= to_unsigned(0, RGB_width);
		  G_sum <= to_unsigned(0, RGB_width);
		  B_sum <= to_unsigned(0, RGB_width);
		  R_sum_temp <= to_unsigned(0, RGB_width);
		  G_sum_temp <= to_unsigned(0, RGB_width);
		  B_sum_temp <= to_unsigned(0, RGB_width);
		  
		  normaliser <= to_unsigned(1, RGB_width);
		  
		  -- Orange Thresholding
		  R_thresh <= to_unsigned(230, RGB_width);
		  G_thresh <= to_unsigned(128, RGB_width);
		  B_thresh <= to_unsigned(23, RGB_width);
		
		-- Wait for new frame
		when wait_for_new_frame =>
		  if (FVAL = '1') then
		    state <= wait_end_of_frame;
		  end if;  
		
	        -- Wait until end of frame
		when wait_end_of_frame =>  
		  if (FVAL = '0') then
		    -- End of frame, start normaliser calculation
		    state <= calc_norm;
		    
		    -- Save the current RGB sum
		    R_sum_temp <= R_sum;
	            G_sum_temp <= G_sum;
		    B_sum_temp <= B_sum;
			  
	            -- Reset RGB sum counter for new frame
		    R_sum <= to_unsigned(0, RGB_width);
		    G_sum <= to_unsigned(0, RGB_width);
		    B_sum <= to_unsigned(0, RGB_width);
		  end if;
		
		-- Calculate Normaliser
		when calc_norm =>
		  state <= calc_new_RGB;
      
                   -- TO BE OPTIMISED
                   -- Approximate normaliser = biggest element + medium element / 2
                   if (R_sum_temp > G_sum_temp and R_sum_temp > B_sum_temp) then
                      if (G_sum_temp > B_sum_temp) then
                          normaliser <= R_sum_temp + (G_sum_temp srl 1);
                      else
                          normaliser <= R_sum_temp + (B_sum_temp srl 1);
                      end if;
                   elsif (G_sum_temp > R_sum_temp and G_sum_temp > B_sum_temp) then
                      if (R_sum_temp > B_sum_temp) then
                          normaliser <= G_sum_temp + (R_sum_temp srl 1);
                      else
                          normaliser <= G_sum_temp + (B_sum_temp srl 1);
                      end if;
                   else
                      if(R_sum_temp > G_sum_temp) then
                          normaliser <= B_sum_temp + (R_sum_temp srl 1);
                      else
                          normaliser <= B_sum_temp + (G_sum_temp srl 1);
                      end if;
                   end if;
      
                   -- Adjust RGB sums to grey world assumption
                   -- 1) Approximate sqrt(3), by multiplying with 55 (0011 0111).
                   -- 2) To be divided by 32 at next state.
                   R_sum_temp <= ((R_sum_temp sll 5) + (R_sum_temp sll 4) + (R_sum_temp sll 2) + (R_sum_temp sll 1) + R_sum_temp);
                   G_sum_temp <= ((G_sum_temp sll 5) + (G_sum_temp sll 4) + (G_sum_temp sll 2) + (G_sum_temp sll 1) + G_sum_temp);
                   B_sum_temp <= ((B_sum_temp sll 5) + (B_sum_temp sll 4) + (B_sum_temp sll 2) + (B_sum_temp sll 1) + B_sum_temp);
		  
               when calc_new_RGB =>
                  state <= calc_RGB_dyn;
      
                  --Calculate Dynamic RGB and divide by 32
                  -- R_start = 230 (1110 0110).
                  R_thresh <= (((R_sum_temp sll 7) + (R_sum_temp sll 6) + (R_sum_temp sll 5) + (R_sum_temp sll 2) + (R_sum_temp sll 1)) srl 5);
                  -- G_start = 128 (1000 0000).
                  G_thresh <= ((G_sum_temp sll 7) srl 5);
                  -- B_start = 23 (0001 0111).
                   B_thresh <= (((B_sum_temp sll 4) + (B_sum_temp sll 2) + (B_sum_temp sll 1) + B_sum_temp) srl 5);
    
      when calc_RGB_dyn =>
      state <= wait_for_new_frame;
       
       ------ DIVISION  REQUIRED HERE FOR R_thresh, G_thresh and B_thresh -----
       div_component: LPM_DIVIDE
        generic map(
              LPM_WIDTHN     => 40,  
              LPM_WIDTHD     => 40, 
              LPM_NREPRESENTATION  =>"UNSIGNED",
              LPM_DREPRESENTATION  =>"UNSIGNED",
              LPM_PIPELINE   => 0,
              LPM_TYPE       =>L_DIVIDE,
              LPM_HINT       =>"UNUSED"
              )
        port map (
              NUMER     => std_logic_vector(R_thresh),
              DENOM     => std_logic_vector(normaliser),
              ACLR      => '0',
              CLOCK     => CLK,
              CLKEN     => '1',
              QUOTIENT  => R_dyn_thresh,
              REMAIN    => open
              );
          
            end case;
	end if;
	end process;
	
	R_DYN <= R_thresh;
	G_DYN <= G_thresh;
	B_DYN <= B_thresh;
	NORM <= normaliser;
	
end rt1;

Altera_Forum · ‎09-22-2012

I see two aspects of your question

- the VHDL syntax part

- speed (the real problem)

The first point is rather trivial. Component instantiations are only allowed in concurrent code, not in sequential blocks. I don't apply to retell the basic VHDL concepts in this post, you should review the topic in your favourite VHDL text book.

But it's a more formal point because you can place the divider outside the block and "connect" it through signals. Synchronizing a piplined divider needs to be considered as additional problem, but is basically possible.

For signed and unsigned types, inference of hardware dividers from a "/" division operator is also supported by the compiler. But you have only limited options to control pipelined operation, so it may be better to refer to explicite MegaFunction instantiation.

If pipeline operation is necessary is mainly a matter of your clock speed. Timing analysis will answer the question.

A more general question is, if you actually need a divider for your design?

Altera_Forum · ‎09-22-2012

Hi FvM,

Thanks for the reply. Originally I planned to separate division operation by feeding the block with numerator and denominator signals. However, I would not be able to (at least) simulate using ModelSim and check the logics. Therefore, I attempted to include the lpm divide function inside the block.

With our current setup, division is necessary. It is required because the results (RGB ) have to be passed to another block, RGBtoHSV. Luckily, this operation needs to be done only once per frame (for RGB ) which allows some clock cycles to be spent.

As per now I am testing with a basic "/" division operator, which as you pointed you is supported by the compiler (and replaced with lpm divider). From the compiler message, it seems that the numerator and denominator are assigned with 10 and 8 bits respectively. Also, I'll try your suggestion to tackle the syntax and define the lpm divider parameters as necessary.

Vincent.

Altera_Forum · ‎09-24-2012

Hi all,

A quick update: I am now able to implement my algorithm and it is working fine.

However, the amount of resources being used increased by 25% due to (inferred) division megafunction with compilation time of 8 minutes <- used to be 2.5 minutes. Is this expected? Can we optimise this?

Compiler by default setup LPM_DIVIDE parameter (as expected) to be:

Info (12134): Parameter "LPM_WIDTHN" = "40"

Info (12134): Parameter "LPM_WIDTHD" = "40"

Info (12134): Parameter "LPM_NREPRESENTATION" = "UNSIGNED"

Info (12134): Parameter "LPM_DREPRESENTATION" = "UNSIGNED"

Attached is code snippet for division, where all signals are 40 bits.


when calc_RGB_dyn =>
      state <= wait_for_new_frame;
		
		-- Calculate dynamic RGB = normalise new RGB
        R_dyn_thresh <= ((R_thresh) / (normaliser));
        G_dyn_thresh <= ((G_thresh) / (normaliser));
        B_dyn_thresh <= ((B_thresh) / (normaliser));
		 
end case;

Vincent

Altera_Forum · ‎09-24-2012

Yes - because there is no pipelining, it has a lot of routing to do. You FMax performance will also be very very poor.

Altera_Forum · ‎09-24-2012

Hi Tricky,

Thanks for the reply. As speed is not much of an issue, I implemented a divider using subtractions. Something like this:


if ( numerator >= denominator) then
   numerator <= numerator - denominator;
   counter <= counter + 1;
else
   output <= counter;
end if;

This brought down the resources usage from 25% to 5%. Perhaps this is not the best solution, I am open for better ideas.

Vincent