Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
17257 讨论

Using LPM DIVIDE inside VHDL Block

Altera_Forum
名誉分销商 II
4,353 次查看

Hi, 

 

I am currently trying to implement Color Constancy algorithm for image processing. The algorithm requires at least one division operation (NUMER: 40 bits and DENOM: 40 bits). After scouting relevant posts, it seems that LPM_DIVIDE megafunction suits my requirement. 

 

However, I am not entirely sure in implementing the divide function inside my block as well as when to expect the correct output (Note: I am very new to VHDL and FPGAs). Kindly advise on any corrections required. 

 

With my code, I obtained these errors using LPM_DIVIDE: 

** Error: /home/robocup/Desktop/Old Board Apply Color Constancy/ColorConstancy.vhd(213): Illegal sequential statement. 

** Error: /home/robocup/Desktop/Old Board Apply Color Constancy/ColorConstancy.vhd(231): VHDL Compiler exiting 

 

Also, it is very likely that 40bits/40bits result might not be obtained within a single clock - which is fine as long as I can implement a checker. Any suggestion on this part? 

 

Vincent 

 

library ieee; use ieee.std_logic_1164.all; use ieee.numeric_std.all; library lpm; use lpm.lpm_components.all; -- This entity is to calculate normaliser at each frame. entity ColorConstancy is generic( DATA_WIDTH : integer := 10; COORD_WIDTH : integer := 10; RGB_WIDTH : integer := 40 ); port ( -- Clock signal CLK : in std_logic; -- Input signals R_IN : in unsigned (DATA_WIDTH-1 downto 0); G_IN : in unsigned (DATA_WIDTH-1 downto 0); B_IN : in unsigned (DATA_WIDTH-1 downto 0); DVAL : in std_logic; FVAL : in std_logic; -- Output signals R_DYN : out unsigned (RGB_WIDTH-1 downto 0); G_DYN : out unsigned (RGB_WIDTH-1 downto 0); B_DYN : out unsigned (RGB_WIDTH-1 downto 0); NORM : out unsigned (RGB_WIDTH-1 downto 0) ); end entity; architecture rt1 of ColorConstancy is component LPM_DIVIDE generic ( LPM_WIDTHN : natural; -- MUST be greater than 0 LPM_WIDTHD : natural; -- MUST be greater than 0 LPM_NREPRESENTATION : string := "UNSIGNED"; LPM_DREPRESENTATION : string := "UNSIGNED"; LPM_PIPELINE : natural := 0; LPM_TYPE : string := L_DIVIDE; LPM_HINT : string := "UNUSED" ); port (NUMER : in std_logic_vector(LPM_WIDTHN-1 downto 0); DENOM : in std_logic_vector(LPM_WIDTHD-1 downto 0); ACLR : in std_logic := '0'; CLOCK : in std_logic := '0'; CLKEN : in std_logic := '1'; QUOTIENT : out std_logic_vector(LPM_WIDTHN-1 downto 0); REMAIN : out std_logic_vector(LPM_WIDTHD-1 downto 0) ); end component LPM_DIVIDE; -- Various states type state_t is ( first_frame, wait_for_new_frame, wait_end_of_frame, calc_norm, calc_new_RGB, calc_RGB_dyn ); -- Current state signal state : state_t; -- RGB sums signal R_sum, G_sum, B_sum : unsigned (RGB_WIDTH-1 downto 0); signal R_sum_temp, G_sum_temp, B_sum_temp : unsigned (RGB_WIDTH-1 downto 0); -- RGB normaliser signal normaliser : unsigned (RGB_WIDTH-1 downto 0); -- Colour Constancy variables signal R_thresh : unsigned (RGB_WIDTH-1 downto 0); signal G_thresh : unsigned (RGB_WIDTH-1 downto 0); signal B_thresh : unsigned (RGB_WIDTH-1 downto 0); -- LPM DIVIDER signal R_dyn_thresh : std_logic_vector (RGB_WIDTH-1 downto 0); begin process (CLK) BEGIN if (rising_edge(CLK)) then if (DVAL = '1' and FVAL = '1') then -- Next pixel, same frame R_sum <= ( R_sum + resize(R_IN, RGB_width) ); G_sum <= ( G_sum + resize(G_IN, RGB_width) ); B_sum <= ( B_sum + resize(B_IN, RGB_width) ); end if; case state is when first_frame => if (FVAL = '1') then state <= wait_for_new_frame; end if; -- Initialise R_sum <= to_unsigned(0, RGB_width); G_sum <= to_unsigned(0, RGB_width); B_sum <= to_unsigned(0, RGB_width); R_sum_temp <= to_unsigned(0, RGB_width); G_sum_temp <= to_unsigned(0, RGB_width); B_sum_temp <= to_unsigned(0, RGB_width); normaliser <= to_unsigned(1, RGB_width); -- Orange Thresholding R_thresh <= to_unsigned(230, RGB_width); G_thresh <= to_unsigned(128, RGB_width); B_thresh <= to_unsigned(23, RGB_width); -- Wait for new frame when wait_for_new_frame => if (FVAL = '1') then state <= wait_end_of_frame; end if; -- Wait until end of frame when wait_end_of_frame => if (FVAL = '0') then -- End of frame, start normaliser calculation state <= calc_norm; -- Save the current RGB sum R_sum_temp <= R_sum; G_sum_temp <= G_sum; B_sum_temp <= B_sum; -- Reset RGB sum counter for new frame R_sum <= to_unsigned(0, RGB_width); G_sum <= to_unsigned(0, RGB_width); B_sum <= to_unsigned(0, RGB_width); end if; -- Calculate Normaliser when calc_norm => state <= calc_new_RGB; -- TO BE OPTIMISED -- Approximate normaliser = biggest element + medium element / 2 if (R_sum_temp > G_sum_temp and R_sum_temp > B_sum_temp) then if (G_sum_temp > B_sum_temp) then normaliser <= R_sum_temp + (G_sum_temp srl 1); else normaliser <= R_sum_temp + (B_sum_temp srl 1); end if; elsif (G_sum_temp > R_sum_temp and G_sum_temp > B_sum_temp) then if (R_sum_temp > B_sum_temp) then normaliser <= G_sum_temp + (R_sum_temp srl 1); else normaliser <= G_sum_temp + (B_sum_temp srl 1); end if; else if(R_sum_temp > G_sum_temp) then normaliser <= B_sum_temp + (R_sum_temp srl 1); else normaliser <= B_sum_temp + (G_sum_temp srl 1); end if; end if; -- Adjust RGB sums to grey world assumption -- 1) Approximate sqrt(3), by multiplying with 55 (0011 0111). -- 2) To be divided by 32 at next state. R_sum_temp <= ((R_sum_temp sll 5) + (R_sum_temp sll 4) + (R_sum_temp sll 2) + (R_sum_temp sll 1) + R_sum_temp); G_sum_temp <= ((G_sum_temp sll 5) + (G_sum_temp sll 4) + (G_sum_temp sll 2) + (G_sum_temp sll 1) + G_sum_temp); B_sum_temp <= ((B_sum_temp sll 5) + (B_sum_temp sll 4) + (B_sum_temp sll 2) + (B_sum_temp sll 1) + B_sum_temp); when calc_new_RGB => state <= calc_RGB_dyn; --Calculate Dynamic RGB and divide by 32 -- R_start = 230 (1110 0110). R_thresh <= (((R_sum_temp sll 7) + (R_sum_temp sll 6) + (R_sum_temp sll 5) + (R_sum_temp sll 2) + (R_sum_temp sll 1)) srl 5); -- G_start = 128 (1000 0000). G_thresh <= ((G_sum_temp sll 7) srl 5); -- B_start = 23 (0001 0111). B_thresh <= (((B_sum_temp sll 4) + (B_sum_temp sll 2) + (B_sum_temp sll 1) + B_sum_temp) srl 5); when calc_RGB_dyn => state <= wait_for_new_frame; ------ DIVISION REQUIRED HERE FOR R_thresh, G_thresh and B_thresh ----- div_component: LPM_DIVIDE generic map( LPM_WIDTHN => 40, LPM_WIDTHD => 40, LPM_NREPRESENTATION =>"UNSIGNED", LPM_DREPRESENTATION =>"UNSIGNED", LPM_PIPELINE => 0, LPM_TYPE =>L_DIVIDE, LPM_HINT =>"UNUSED" ) port map ( NUMER => std_logic_vector(R_thresh), DENOM => std_logic_vector(normaliser), ACLR => '0', CLOCK => CLK, CLKEN => '1', QUOTIENT => R_dyn_thresh, REMAIN => open ); end case; end if; end process; R_DYN <= R_thresh; G_DYN <= G_thresh; B_DYN <= B_thresh; NORM <= normaliser; end rt1;
0 项奖励
5 回复数
Altera_Forum
名誉分销商 II
2,862 次查看

I see two aspects of your question 

- the VHDL syntax part 

- speed (the real problem)  

 

The first point is rather trivial. Component instantiations are only allowed in concurrent code, not in sequential blocks. I don't apply to retell the basic VHDL concepts in this post, you should review the topic in your favourite VHDL text book.  

 

But it's a more formal point because you can place the divider outside the block and "connect" it through signals. Synchronizing a piplined divider needs to be considered as additional problem, but is basically possible. 

 

For signed and unsigned types, inference of hardware dividers from a "/" division operator is also supported by the compiler. But you have only limited options to control pipelined operation, so it may be better to refer to explicite MegaFunction instantiation. 

 

If pipeline operation is necessary is mainly a matter of your clock speed. Timing analysis will answer the question. 

 

A more general question is, if you actually need a divider for your design?
0 项奖励
Altera_Forum
名誉分销商 II
2,862 次查看

Hi FvM, 

 

Thanks for the reply. Originally I planned to separate division operation by feeding the block with numerator and denominator signals. However, I would not be able to (at least) simulate using ModelSim and check the logics. Therefore, I attempted to include the lpm divide function inside the block. 

 

With our current setup, division is necessary. It is required because the results (RGB ) have to be passed to another block, RGBtoHSV. Luckily, this operation needs to be done only once per frame (for RGB ) which allows some clock cycles to be spent. 

 

As per now I am testing with a basic "/" division operator, which as you pointed you is supported by the compiler (and replaced with lpm divider). From the compiler message, it seems that the numerator and denominator are assigned with 10 and 8 bits respectively. Also, I'll try your suggestion to tackle the syntax and define the lpm divider parameters as necessary. 

 

Vincent.
0 项奖励
Altera_Forum
名誉分销商 II
2,862 次查看

Hi all, 

 

A quick update: I am now able to implement my algorithm and it is working fine. 

 

However, the amount of resources being used increased by 25% due to (inferred) division megafunction with compilation time of 8 minutes <- used to be 2.5 minutes. Is this expected? Can we optimise this? 

 

Compiler by default setup LPM_DIVIDE parameter (as expected) to be: 

Info (12134): Parameter "LPM_WIDTHN" = "40" 

Info (12134): Parameter "LPM_WIDTHD" = "40" 

Info (12134): Parameter "LPM_NREPRESENTATION" = "UNSIGNED" 

Info (12134): Parameter "LPM_DREPRESENTATION" = "UNSIGNED" 

 

Attached is code snippet for division, where all signals are 40 bits. 

when calc_RGB_dyn => state <= wait_for_new_frame; -- Calculate dynamic RGB = normalise new RGB R_dyn_thresh <= ((R_thresh) / (normaliser)); G_dyn_thresh <= ((G_thresh) / (normaliser)); B_dyn_thresh <= ((B_thresh) / (normaliser)); end case;  

 

Vincent
0 项奖励
Altera_Forum
名誉分销商 II
2,862 次查看

Yes - because there is no pipelining, it has a lot of routing to do. You FMax performance will also be very very poor.

0 项奖励
Altera_Forum
名誉分销商 II
2,862 次查看

Hi Tricky, 

 

Thanks for the reply. As speed is not much of an issue, I implemented a divider using subtractions. Something like this: 

 

if ( numerator >= denominator) then numerator <= numerator - denominator; counter <= counter + 1; else output <= counter; end if;  

 

This brought down the resources usage from 25% to 5%. Perhaps this is not the best solution, I am open for better ideas. 

 

Vincent
0 项奖励
回复